 Good afternoon everyone my name is Raghu Balakrishnan. I'm a professor of electrical and computer engineering. I'm also the Michael and Katharine Burke head of ECE and it's my distinct pleasure to welcome you to this afternoon's College of Engineering Distinguished Lecture by Dr. Robert Khan. We are in a special, we are in for a very special treat this afternoon and what I'd like to do is to first introduce our Dean of Engineering Dr. Mang Chiang who in turn will tell you why it is such a special treat for us. While Dr. Chiang is walking up I'd like to briefly introduce him. Dr. Mang Chiang is the John A. Edwardson Dean of the College of Engineering. His research received the 2013 Alan T. Waterman Award. His online courses and textbooks reached over a quarter million students and he's co-founded several startup companies and a non-profit consortium. Please join me in welcoming Dean Chiang. Well good afternoon everyone here in physical presence or in virtual presence on Facebook which is watching you as well. My name is Mang Chiang on behalf of Purdue College of Engineering. It is such a special honor to welcome the distinguished lecturer to the grand finale of the inaugural season of the Purdue Engineering Distinguished Lecture series Dr. Robert Khan. A living legend and a national treasure. I can spend the next one hour going through this bio and won't be able to finish. I'll be brief. Dr. Khan is widely known as one of the fathers of the internet. In particular in 1966 Dr. Khan moved from MIT to a BBNN to start what was known as the ARPA net that led to the internet. 1972 Dr. Khan moved from BBNN to DARPA and led the largest effort by United States government to that point to support computer and networking research and development. In 1974 Dr. Khan together with Vint Cerf wrote the paper that gave us TCPIP the glue that led to the success of internet. And since then Dr. Khan has continued to innovate including in the space of digital object architecture, MEMS exchange and many more beyond the internet. What he did for the world and humanity in the internet invention led to numerous awards including the IEEE Medal of Honor, the ACM Turing Award, the Japan Prize, the Quinn Elizabeth Prize in engineering, the McConaughey Prize, the Draper Prize, United States National Medal of Technology. I don't have my notes in front of me. I just happen to remember all these awards by heart and many more that I do not remember by heart just yet. Let me just conclude this brief introduction with one more distinct honor that Dr. Khan received in 2004, the United States Presidential Medal of Freedom, the highest honor can be disposed on to a civilian of this country. So let's welcome the living legend, father of internet, Dr. Robert Khan. Thank you. Okay, so it's sufficiently bright up here. I can't really see you folks all that well but I'll just take it for granted that you're not walking out on this lecture. So what I'd like to say is that I've been focused on infrastructure development for most of the time after taking a leave of absence from MIT or is on the faculty and although I've been involved in network development all the way, as Meng mentioned, I've been involved in many other things along the way including leading the research programs at DARPA for a number of years when we were the largest supporter of computer science and IT R&D probably in the world. And the problem with working on infrastructure is you really can't see it. And so unless you have a pretty good idea of what it is, sometimes the ideas can kind of roll over you and they sound good but you don't know what to do with them. So in the in the 1970s I remember giving a number of talks about the internet when it was more than just an idea we're actually building out part of it with the research community because this was the era when you were starting to get workstations. A PC had not yet been invented or developed or made available but people could get powerful workstations and local area nets. And I remember giving a lecture to groups that were not involved because the people were actively involved so they knew what it was all about. And the reaction I often got at the end of those remarks would say that was a very interesting lecture. I'm hoping I don't get that at the end of this one although I hope you do find it interesting. But they would say at the end of this tell me again why I would want an IP address. So you just understood that they didn't really get it at the nuts and bolts level. When we started out to build the internet, in fact when we started out to do computer networking at all, the goal was to get the bits from one computer to another. The internet simply put it in the multiple network environment where instead of just a landline net, namely ARPANET, we had a few other nets that I was involved in developing a packet radio net which is kind of like the forerunner of today's cellular systems or there was a satellite net on Intel sat forward a link to the European research community. And the goal we had was to just get the bits from one computer to another with the idea that the users when they got those bits there, they would navigate by fingers on a keyboard and eyes on a screen. And here we are some 45 years later and we are still for the most part in the in the paradigm of navigating for the most part with fingers on keyboards and eyeballs on screens. We try to change that back in the 1980s and I were collaborated with and surf on this as well. And we came up with the idea of mobile programs that could run through the internet and carry out some of these tasks. But that came to the forefront at almost exactly the same time that the first viruses and worms and Trojan horses were being introduced into the internet environment. And so most of the organizations I thought that would be most interested in that notion that you could remove yourself from having to navigate everything kind of educate a program as a fact totem, let it loose and it could advise you about what you needed to know about or carry out your tasks. They found that unacceptable because they were uncomfortable with the idea of somebody else's programs just showing up on their machine. I think it's time will come. But that's sort of where we have been until recently. And you might say why wasn't the worldwide web the solution and for many people is a very effective solution. I use it quite a bit myself. Most people do. But when you look at the fundamental issues of information management, they often involve proprietary information, personal information. They often involve security at different levels that have to be invoked. And it's a very difficult kind of a situation when you're especially trying to find old information. So I'll give you an example. Suppose I mean we were involved in developing one of the most widely used programming languages today. It's known as Python and along the Java and the C family. Those are the three probably most widely used languages. But we went to try and clear the rights to Python when we were developing it at CNRI. The person who was involved was Guido van Rossen, who was doing the work. And we had to go back and find out, you know, what, what happened? Because we had, he was at CWI in the Netherlands working on a programming language called ABC for Children. When he got hired there, what were the rules that apply to him? What agreements did he have in place? And we had to find out all this old information, which you couldn't do by just navigating with fingers on a keyboard. And today, we are working in a variety of different contexts with different groups that are selectively using this kind of architectural notion to deal with information, whether it's for managing supply chain for movies in Hollywood with the cable TV industry, or options trading around the globe or construction information. And most importantly, I think first out of the box or the libraries, Purdue was one of them way back when, along with the publishers who were making available their information. And so part of the architecture I'm going to describe today is not just a hypothetical. This is an architecture that in places is very widely used, but often for pieces of it, not for the whole, the whole thing. So if you look at any technical journal from probably ACM, right, high triple E or some of the medical journals, you'll see references to things like digital object identifiers, every article and they've been doing that for probably two decades. But they have been very reluctant to make those articles actually available because they're afraid that the crown jewels of the publishing industry could be effective. But I think more and more as we get to understand this, the benefits of this will come out and we've been having discussions about where else it could be used. And I think it's a perfectly good way to think about managing information in organizations, whether they be a university or a business going forward. So if somebody were to tell you that the electrical power infrastructure was available and you were there's 100 or more years ago, most people said, Well, what good is it? Because they didn't see the applications and they couldn't see the electrical infrastructure. So it took a while to cause it to be built up whether it was from public safety interests for electrical lighting outdoors in place of gas lamps, or whatever. I mean, you didn't have electrical heating in your houses. You didn't have lamps until light bulbs and the electrical infrastructure was fully in place in the homes. But you know, people didn't see it initially. So they may not have may not have had a good way of really understanding it. So somebody said, Look, I have a billion volt wire we could put in your house. Really, the latest thing in infrastructure, most people wouldn't know what to do with the billion volt wire, then probably would be scared about it because it sounds dangerous or something, even if you can see the transmission lines and the like. So I think that the applications are often important to people to understand. And when we talk about managing information, I have to tell you that the tack I'm taking here is to not solve a particular application any more than the internet or the ARP and any of even lands were intended to solve a specific application. It was an infrastructural capability we were fairly sure people could take advantage of. In the original internet, as you know, it would have been a very different development if you wanted to have an interaction with a remote computer, you had to ask, well, where was it located? What network was it on? What protocols did it use? What gateway do I connect to? How do I route the traffic? We would not have an internet like we do today where you can just simply easily identify something and have the bits show up in the right place. But when it comes to information, it's a very different story. And if you want to manage information over very long periods, you need ways to do that effectively. So that's what this is all about. And I hope I can explain it to you in a way that makes you comfortable that it's not about the technology any more than the internet was about the technology itself. So let me see if I can get you to the next slide. Okay. So one of the issues that we had early on was that, you know, people were, Congress was passing laws about the internet and nobody really knew what it was. So it was an FNC definition that defined it as a global information service. They've gone back and forth on is it a telecommunications utility? Is it a global information service? So they regulated differently. And what's it about? Well, it was never those protocols were never about the technology was all about whatever the technology was. How do you make it possible to make them work together? The computers, the networks, whatever we're talking about here, the net effect is over the life of those protocols, which are still being used today, some 45 years since we first started the work, roughly. The scaling up of the technology has grown by something on the order of a factor of 10 million. If this goes on, we have an internet in a decade or two, especially as the Internet of Things grows, and those protocols are still in use, we will have a scaling up of a factor of a billion or a trillion. Nothing in the history of the engineering world that I'm aware of has ever scaled up by that much. If that were the case, I mean, take a look at airplanes, they've gone from what order of magnitude 100 miles an hour to either 600 or a little bit above supersonic, you're talking about factors of 10 to 100, not a trillion. And that's the reason for that is because this architecture was never about the technology. It was all about whatever it is, enable it to work. And so if you think about the digital object architecture, it's really, in my view, a logical extension of the internet. It's based on the same architectural ideas that showed up, namely, it's an open architecture to find interfaces and protocols. It's independent of the underlying technology. You don't have to ask, you know, are we using databases or quantum storage systems? Or what are the interfaces any more than we worry about tracks and sectors on disks today? And the important thing about internet, the internet and infrastructure in general is that the most effective infrastructure developments are those that are conceptually simple in both the understanding of it by users and the ability of applications to make use of it. And that's the case here. This architecture is about as minimal as you can get to manage information, which means there's a lot of room for people to adapt it to their own needs and requirements. It is particularly useful for getting interoperability between different systems. And this is probably the most important comment I can make. It's a non-proprietary architecture. For many years, people have said, because we were involved in developing it, it was proprietary in my organization, although the funding for it actually originally came from DARPA and was an outgrowth of this work on mobile programs that Bint and I had done. But I want to tell you also that we heard the same thing about the internet itself in the mid 1980s. So I was one of the early members of a new board that the National Academy had set up in Washington called the Computer Science and Technology Board, I think it was called, at the time. And they were looking for things to work on. And I proposed at the time, why don't they think about the impact of the internet, which we were referring to as the national information infrastructure as its impacts on societies will evolve. So take a look at that and see if you can get a handle on that. And the answer was, no, we can't work on that because that's CNRIs, that's my organization that I still run. That's our proprietary technology. And I said, no, it isn't. This was developed with federal government support. It's a public thing. It's in the public interest. Two years later, the federal government actually gave the Academy some money to look into that very same problem. And they decided, oh, I guess it wasn't proprietary after all. So you have to distinguish between an architecture which sort of lays out how things can work and the actual implementation of it. Now, somebody might have an electoral property in an architecture, like if you're a building designer. But this is one for which there is no intellectual property in the architecture. Nobody is claiming it, certainly not us. Nobody else that I'm aware of that really is in a position to. But the implementations of pieces of it could be proprietary. So if a company built a TCP IP implementation, that could be theirs and they may want to charge for it. But anybody could then build those protocols and continue on with it. Now, you know, managing digital information means different things to different people. I mean, I recently gave a lecture at a World Conference on Humanities and they were more interested in the linguistic side of things. So this is an example of some of the things that came up there that, you know, we have language in the world because it's used to create literature and different languages produce literature in different forms. But in the computer world, the same thing is true. We have programming languages that produce computer programs in those languages. They're not quite English, French and Chinese, but they can be Java, Python or C++. And these programs and any other information in digital form can be structured as digital objects and manage them. And so just like the early networks that we developed were based on the notion of packets, which had addresses so that you could route them through a network. But once they got delivered, they became ephemeral. You couldn't say, I would like to gain access to the packet that was sent on such and such a network 43 years ago and expect to get it. Nobody's keeping track of that. There's no reason to. But when you're managing digital information of some import, there are many cases where you want to manage that information actually in perpetuity. If it's business information, you might want to keep it for a very long time. If it's governmental information, some of it you might really want to keep in perpetuity. And if it's laws and regulations, as they might apply to various things at various points in the past, you probably want to keep all of that as well. So in this world of the digital object architecture, the digital objects are the lingua franca. Everything that I talk about is about these objects. So let me say what a digital object is. In the first order, it's basically a sequence of bits or a set of those sequences. So this could be a digitized version of a movie in which case you have an audio part and a video part, some sequencing, perhaps some subtitles, some synchronization. But it could be a chip design. It's got various aspects of it. I mean, literally anything that you can represent in digital form. And it has, this is important, associated with it a unique persistent identifier. And that identifier is sort of part of the object in some sense, but it's also something that can be resolved all on its own. So let me see if I can put this in context for you. Let's say we're in the world of the Internet of Things and we have, you know, a hundred billion things. And I say here's an identifier for the temperature readings from a particular thermostat, maybe in this auditorium. And it's one of a hundred billion things. You know the identifier, but how do you know that it's this thing in this particular auditorium? You're not going to try all hundred billion. So you need a way of routing the data here. You need a way to reduce this identifier to information about the thing it's identifying. We call that state information. And so the ability to resolve these identifiers is really crucial. Now, this whole issue of how you build a system like that that's meaningful, especially if individual organizations want to create their own identifiers and control their own information. And virtually all the organizations we've talked to want to do that. And so they have the ability to do it locally, but now they might have a hundred thousand or a million of those and which ones of those do you then ask for the state information. So I'll tell you a little bit about how this works. This this piece of the architecture is the most well developed. It's in widespread use and has been for more than I would say 25 years. Purdue was one of the early uses of it in one form. There are uses of it in another form today. The library and publishing community has been first out of the box because they were the ones that saw the need for persistent identifiers in their digital journals that they produce. So if everything was identified with the URL, let's say, and you moved it from one machine to another over time, sooner or later those URLs are not going to work anymore. They didn't want to have to change all of those citation indices at the back. So that's the way it is today and that's the way things actually work. Now a digital object typically will incorporate a work. That's how people thought about it earlier. A work being an incorporeal creation in the world of copyright for which you have to actually reduce it to a form of particular expression. But it could be something in which party has the rights or interests like in a contract or in which there is value and at the end of this particular talk there's a set of references if you make these slides available that references the paper on representing value as digital objects. Which I think is one of the first papers that actually talks about minting a cryptographic string as it evolved. And you probably have seen that more recently in the form of all the cryptocurrency stuff in blockchains. And I've given a number of talks on blockchains which in my view are just simply a particular way of structuring a digital object. So I think this is a context that applies very broadly. And there's a very motivated and encouraged group of folks that are looking at blockchains as something they're particularly interested in. But I think this is a more general way to think about that problem. So basically any kind of information is that's in digital form can be structured and represented as a digital object. If you think about that in some longer term form you know if a piece of information shows up in your machine it'd be very nice to have some context about it. What is it? What's its provenance? Where did it come from? And so having an ability to do that is really important. And we've been playing a role with the research data alliance trying to help them understand how to deal with very large research data sets. So if somebody were to give you you know a terabyte of research data you're not going to know what to do with that unless it's more finely structured and you can go through and see what type of information is the next 20 bits or the next 500 bits or the next megabit. So these types are important and those types can be represented as identifiers as well and reduced to important information about what the type itself means. So these identifiers really are kind of the linchpin just like IP addresses are the linchpin of today's internet and they in general can be used to identify anything that you would like them to identify but it's all about those things represented in digital form. So if I say here's an identifier for an individual I really mean that identifier will resolve to digital information about the individual that they wanted you to know like their public key or maybe their contact information for the day or anything else they chose to make available. It could be about a system that you wanted to verify you're talking to the right one. It could be about content in different forms. So all of this is possible but these identifiers are the linchpin and the resolution system is really necessary because if all you have is identifier you know our argument has been don't put semantic information in these identifiers because if you only understand Chinese you're not going to understand that if it was semantics in English and you need a resolution system in general so put all this semantic information in the resolution system or in something that will serve as its equivalent like a registry or metadata. Every object we assume therefore has not only identifier but the record has a public key and there's a public private key pair that exists and so you can validate the systems the users and the content through the PKI interchange where whatever the party is that's trying to do the validation gives you a nonsense string like we call it a nonce and you encrypt it with your private key and then they can validate it because they presumably know who you are as an identified user to be sure you have the right public key. Now this doesn't solve the problem of vetting the users because all it's saying is this is the person who has the public key pair the public key corresponds to the private key that that individual had so in some cases people will verify it off of security cards issued by trustworthy organizations and the like. And the fact that this produces a PKI infrastructure really enables a lot of very interesting things because people have struggled with how to create a PKI infrastructure but this architecture comes with it fully built in conceptually. Now I mentioned before that this work came out of some work that we did on mobile programs. So we produced a report it was called the world of no bots. This is something that I did with my colleague then surf and because that technology was viewed as potentially dangerous because they didn't know what programs from who would be showing up in a world of viruses. We extracted mobility part of it out and produced the equivalent of the digital object architecture which could have mobility reintroduced at any point in time because a mobile program can be a digital object. But we're assuming right now that we're not dealing with mobility but things in physical structural locations within the Internet environment. So what does this object architecture do? Well first of all it provides a conceptual framework for managing information of all kinds and most people today don't have a framework for that. So I mean I was asked a question well how does this relate to databases? Well if you think about a database people know what they are today but if you took the information from that database and put it into another database you're going to lose all the context about that information like access control to it, provenance perhaps and it'd be very nice if the objects themselves had the ability to self-identify themselves. So when an object moved from one place to another that information all went along with the object. Now whether you use a database to store it or not is immaterial because that's just the low level technology in this architecture. You can put it in the cloud you can put it in multiple clouds you can do anything you like behind the scenes but the whole idea is that using the identifier you should be able to then get the object or some part of the object. So the protocols for doing that enable you to deal with the information that's embedded within these digital objects and that's something that I think is going to stand us in good stead going forward. So we're not to move very large files when you only want to know a small data like a cholesterol reading or a blood pressure reading or something from a much larger record and with that kind of protocol you have the advantage of getting interoperability dealt with right off the bat because if the main interface to something that is making these objects available which I'll get to later call it a repository. If the main interface to that is based on identifiers then every repository regardless of what the storage mechanism is automatically interoperable just like TCP IP allows for interoperability between computers of different sorts this protocol which called the digital object interface protocol DOIP sounds like DOIP but it isn't all identifier base automatically allows for interoperability and will persist over the long term. Now there are three components in the architecture one of which I just mentioned it's a repository because you're not going to access the digital object if you can't get it from somewhere. So repositories store the objects enable their access based on security and identifiers if necessary. It's public then you don't care if it's not then you want to be sure it's only being given to people who have the right cryptographic validations of themselves. You may not remember an identifier if somebody sent to you by email fine you might click on it if somebody cited it in a publication you might click on it but suppose you're looking for the laws in the state of Indiana in okay we'll go into the future 2015 or 2025 or 2035 I know 15 is in the past. And you're looking for a particular law and a particular topic then you need to be able to understand what's the identifier so that you can avoid all of the searching. So that's what registries enable because registries store metadata about the objects so you ought to be able to search them. This architecture doesn't define what the search strategy is. So if somebody comes up with a really good PhD thesis on how to locate people by photographs or music by sounds or whatever then you can incorporate that at the front end of the search part but it's the access to the metadata which would then provide you back a list of identifiers for that or things that are closely related to that. So now you've got the identifier. What's the next step? Presumably you'd go to the repository to access the information unless you don't need to do that. If you just want to get a public key maybe you don't need to go to a repository. The resolution system is the key intermediary. So by going to the resolution system you say here's the identifier. What's the state information about this object? It might say here are 10 places on the internet you can go and you can use normal routing or it might say it's in one particular location. Here's how you authenticate it. Here are terms and conditions for its use. Most things on the internet today you have no clue what you can do with them when you get them. And so sometimes people just do what they think is reasonable but you have a way of actually stating explicitly what you can and can't do in this particular form. Well you can think about this resolution system in a variety of ways. If you put the resolution system in one location then you've got to have the information for every object in that location and that's a it might be if you had a hundred billion repositories and every one of them had a million entries into it and you've got 10 to the 15th, 10 to the 18th you've got a huge number of records in one location. And so when we talk to people about it they all wanted to manage their own and so let's say you had a million organizations that were in a position to create their own identifiers and manage their own identifier records. It's like the catalog for those records if you will. And as I mentioned Purdue was one of the early ones that made use of this and they still do but it's through the publisher's mechanism right now, the DOIs. Which of those million would you ask? And so we ended up with a two level system that I'll describe a little later but it's in widespread use and it's pretty important. There's another effort that's ongoing for which we were working with both NIST and the National Institutes of Health and that's to define what it means to be a type and we're doing it through the folks at ISO. If you've got the ability to define types through your data, what does a type look like? If you don't have a standard way of saying what a type should look like, nobody else is going to understand it. If you have a separate language above that to try and describe it. So we're trying to come up with a meta structure, a meta description of what a type looks like but not to define any particular types. So in the medical community they'll have their expertise that knows how to do that. In the engineering disciplines then they'll be different from chemistry to mechanical to electrical. They can define their own type structures and there may be different ways of doing it. So you will have a way of resolving it and then you can see it in potentially in different languages if somebody is willing to take the time to do that. So within a digital object, every element of that object which can be many elements is represented as a type value pair and the whole object itself is type. So you know what type of entity that is. But the types themselves are represented as digital objects and that's how you can understand for element X you can click on the type and find out what type of element that is. So conceptually that's what it looks like. It's a little sketch I have just to show you. I can't see how it's clicking but you should have on the screen in the upper right hand corner repositories below it the resolution service to the left of that resource discovery which is really the metadata records and you have a client in the upper left hand corner. So the client will go try and discover an identifier. It'll come back. He'll go to the identifier resolution service to resolve it. That'll come back. That might then go to repository to get the object and that'll come back and I can't see. So he's got the data he wants after some of those interactions. So as I mentioned this work started out with the work that Ben and I did in the late 80s on mobile programs but it got elaborated in the early 90s with DARPA funding in something called the CSTR project in which we worked with a number of DARPA designated universities to actually digitize their computer science technical reports, the stuff that was in the gray literature. And that was a very interesting interaction because we had a lot of discussions about identifiers where one university would say I want to put semantics into the identifier because I want people to know it comes from my university and I'll never sell my publications to another university, particularly my PhD thesis or anything like that. And yet later on they realized the value of that and the publishers knew it right from day one because if you go to a major publisher they might take a whole bunch of their collections and want to sell that collection to another publisher for whatever reason or they merge with somebody and they don't want the train of semantics going along with it. So they want to be able to have an identifier that's sort of kind of neutral with regard to any of the semantics. In 1994 or five we set up a group of companies in the United States with like, I don't know, it was at least 50 and might have been 70 or 80 all across the board and it was an attempt to get them to understand what the internet was about. We might do the same thing for the digital object stuff once they feel comfortable having a solution that isn't owned by any one of them. Well they understood, industry did, that they didn't own the internet at that point in time, but what was it and what are other people thinking? So we had people from semiconductors, computer software, applications, computers, networking, router builders, newspaper people, financial people, and we brought them all together and there was a report that got put out 1997. It was called, basically something like managing access to digital information and it was an approach to that was based on digital objects and stated operations on those objects. Now if you think about object oriented programming, the whole point of object oriented programming when it first came out, I followed that very closely, was to insulate the programmers from all the details of the internal structures of the program. They have to worry about setting up a raise and pointers and the like. They had built in methods that allowed you to access it. But when you're talking about intellectual property or other important things that people care about, those organizations really wanted to be able to license those interfaces. So somebody wanted to do something with that material. They wanted the ability to have that as a licensing capability. So that's where stated operations came in where you could actually indicate what kinds of operations are possible with the object for that particular individual or the public at large. And when this was first presented when the world was starting to think about IDs, it actually got the Digital ID World Award. I'll show you a picture of that in a second. So that's what the report looked like. I scanned it in landscape form rather than portrait mode. So it only shows, I don't know if there's probably no setup. Whoops. It goes one way. And I thought there might be a laser here. But you can see there's a list of like seven companies there, I think, starting with the A's. And it just gets up to the B's. But they're all together about somewhere between 50 and 80 companies that signed onto it. There's a reference to that at the end. I commend it to you. And that's what the Digital ID World Award looked like. And it cited the bottom Digital Object Architecture Award in 2003 for balancing innovation with reality. Now, the way you actually interface with the Digital Object is through this protocol. And the protocol itself is really pretty simple. It's based on you give it an identifier, maybe your own identifier too. And then they can validate whether you should be able to get it. And they know exactly what it is. And you can penetrate into the objects and interface with the information itself. None of the other systems do that. Historically, everything about networking was based on the technology. Wires on the internet, machines for IP addresses, files in the case of URLs on the web. And you don't want to have what happened. Is that me? Did I do that? OK, there we are. So you don't want to have to be asking those kind of questions in the future. Imagine somebody coming back and saying, OK, you want to get a copy of the law in Indiana from 2025. And it was on a machine called this back then. Let's say it's 100 years before. Well, that machine isn't going to be around. Doesn't help to tell it what machine it was on back then. You just want to get it right now. Doesn't help to say what wire it was connected to via machine on the ARPANET. ARPANET's long term gone. And who knows what networking strategies we'll be using then. You really want to be able to identify the information and go along with it. And I think it's really the right way to be thinking about this. We have an effort that we undertook to try and describe this. And it became a standard through ITU, but mainly at a descriptive level. So it's not a specification for implementation. But we're about to make that available standalone from the things that now harness it. But if you look at how it works, see the red part on that slide is supposed to show sort of the front end logical processing. It takes these identifiers in. But all the digital objects are out the back end. So you don't care from a user perspective whether it's on a thumb drive, on a disk drive, on a rate array, in a cloud service, or who knows what in the future. And in fact, it could be on multiple cloud services, which we also demonstrated. So that in the future, you can take those objects, port it into any other system. You can move it from cloud service to cloud service, which I think the clouds will eventually have to support. But they may not want to do it right now, because they may feel like they're losing customers. But I think the right way to do that, and the minute you have this kind of interface to these repositories and even registries, then you get automatically the kind of interoperability that you get with the internet when people use TCP, IP protocols. So it's kind of like the logical equivalent of that. We have a piece of software we put out on the net because people asked us. They had to download repository code. They had to download registry code, put them together and make them work. And they said, look, repositories need registry. So we know what's in the repository. It's like a local index. And guess what? Registries need repositories to store the metadata records. So it's sort of the same set of software. Can't you bundle them together, which we did? We put out a piece of software called Cordra. It's on the Cordra.org site. We're about to release a second version of it with the updated version of the DOIP protocol. And no charge on that. But it does base itself on the use of handles. So you need to be able to create handles and manage handles. Well, I believe Purdue can do that itself, whether they do it as DOIs or plain vanilla handles. And virtually anybody else can because that's not a profit-making operation that we or anybody else tends to run. There's an experimental mode where you can try it out. And there's a regular mode where you can just deal with it persistently. This reference out of ITU is called X-1255. It came out of a working group on identity management information. So it's couched in terms of discovery of identity management information. But that's like something about email which can be used for anything couched in terms of chemistry needs. This is an email protocol for chemical users when in fact it's the same protocol for physics users. Same protocol for housewives and whatever it is that is motivating you. This is a very general framework description and it's all based on the digital object architecture. And it was adopted as a global standard in 2013. Now metadata is another one of those terms that people struggle with. If you ask most people what is metadata, they'll probably say it's data about data or something like that. But in fact, I think of it as assertions, namely they can be about identity, like what's the research called, provenance. Who created it and where was it created? Access, what are the access constraints, protocols? You can have descriptions of the data, various technical parts of it, what stage in the life cycle. And then you've got issues about structure and representation. Those are just examples. That's what metadata is really all about. And a metadata registry will keep that kind of information. So you might want to know all kinds of information where you're looking for key words of sort or images or whatever that leads to that. Now let me just say a few words about, I said a lot about things and things. Let me talk about blocks and blockchains briefly. You know, blockchains sound like they're new, but the notion of a block is not really a new item. Anybody who's dealt with, like communications knows about block coding. Anybody who's dealt with deep space communication knows that if you're gonna send information and wait for an acknowledgement and retransmission, you know, it could be like Mars, I think roundtrip is what, 10 to 15 minutes roundtrip. And so there's a lot of latency involved in doing that. And so what people tend to do is chain blocks. Burst trapping codes, things like that have really dealt with that sort of situation. So the ability to link these blocks together is not new. And of course in the programming field, linked lists have been around for as long as I can remember. And there are various ways in which you can hook them up one to the other. But blocks were not usually netted separately from the applications, but they could be. And so the work on the blockchain stuff purports to be new. And what's really new about it is sort of the awareness that people have of the fact that cryptocurrencies can have value and that they can exist in that there are ways to authenticate them or evaluate them or the like. They don't require, in their view, a centralized authority, although somebody's got to be able to say how the cryptographic stuff works, how do you change it, what are the rules and requirements when you need to take new actions regarding the whole system. But it's independent of several regulatory authorities which many people find attractive. Of course, many other people were afraid of it for exactly that reason. And I think it remains to be seen how the regulators in general will deal with this as it becomes more and more prevalent around the world. I think it's gonna need to be visibility at the level of the regulators and communities and they'll probably will mandate that in time. But you don't have to get to a system which causes everything to be replicated and stored and linked together essentially in perpetuity. One of their big problems is how do you fork a blockchain? And I was recently at, I gave a keynote at a blockchain summit in Australia last month and they had some of the best coders from around the world showing up there. And I said, well, what are some of the problems you're focused on right now? I said, well, techniques for how to fork a blockchain. And so I said, oh, really? So he said to me, how do you fork blockchains in the digital object architecture? And I said, well, we don't have that problem because we never have to deal with that particular issue because it isn't required. And so that ended up in another long discussion that said, oh, you're blowing my mind because this is not what we're trying to do. We're trying to do it a very different way. It's a choice that you make. How to structure a digital object and how do you link things together? Whether you need to do that or not, that's again a choice. So I think the idea of using this kind of information and chaining things together has really come up in lots of different contexts. But I think that just to link it together, I think this information about a block, which you need to know, because you need to get its identifier to deal with it. It's in the province of metadata. It can be self-contained. I think the amount of metadata can in fact be enormous. But I wanna go into some more general observations and then take any questions. So I'm almost done here. So I think the context for the blockchain technology has been around for many years. And I believe every block is an example of a digital object. It needs to be identified, needs to be understood, needs to be persistently accessible. It's a particular way, as I said, of structuring a digital object that comprises many others. Digital objects are stored in repositories and those can be replicated and mirrored. And there are various ways to cross check if these multiple repository entries are appropriate. And so trust in the system is really something that's inherent to it. So you can have an object that never changes. And there's a very simple way to validate that kind of an object without having to have all of this other material. Like you can create an identifier for an immutable object that simply involves putting an appropriate fingerprint of the object, maybe some length considerations, in the identifier itself. So once you get the object from the identifier, you can validate whether it meets the appropriate check sums without having to know about the party that provided it. Either it's all based on trusting the encryption part of the schema. If it's immutable object, obviously, it's gonna keep changing so you can't put it in the identifier if the identifier never changes. But you can get that information out of the record from Resolution System. So I won't go into the details of how you could do that, but the basic issue here is, do you trust the Resolution System or do you not trust it? And I think this is something that can be trusted because ultimately it's on the party that created the information to maintain that information. And they could presumably change other things about that information, which they have no reason to want to do. So thanks, they wanna keep the proper information. If it's publishers, they don't wanna change the papers that they published, if it's laws they wanna keep. And you trust the parties that created it to maintain that appropriately. So that's what I tend to talk a lot about at Blockchain Stuff. So let me say a few things about this two stage resolution thing. The way we create identifiers is by giving an organization that wants to create them a number. So typically a prefix, that's a dotted prefix derived from a credential. So in the past we would say, okay, well, Purdue can have, I don't know, 1,015, and you create 1,015 slash whatever you like. So you can have any identifier system you now use and that identifier system can still be used. So it could be social security numbers, driver's licenses, license plates, they could be cryptographic strings, it could be whatever, it could even have semantics if you wanted, although we don't recommend that. But that then, that system allows an individual organization to create the local records. And so what you need to do is to get to their local records to find out what's going on. And it's under their control and management. This is inherently a very distributed system and those local services themselves can be mirrored for reliability and security as desired. So this is a picture of showing it, conceptually this is a really simple system. You go into the system and you get back this handle record and that you interpret to figure out what to do next. I'm gonna step you through this very quickly just to give you a feeling for, it's like trying to describe a router to somebody, it can say simply, it takes a packet in, puts a packet out, participate in some routing protocol. But conceptually, people spend a lot of time figuring out how to actually implement it. I could describe an operating system too, pretty simply, but the details can be pretty complex. So I'm not gonna go through every step here but there's the handle system, there's a global registry that contains the prefixes. So there they are and here are various services that are available. These are basically in today's world run by different parties around the globe. So the global handle registry is run by a foundation that was set up in Geneva to make this attractive to organizations and companies around the globe that did not want to rely totally on a US developed capability or managed capability. So those services themselves are run in different places and so you can go to any one of them if you wanna resolve an identifier and each one of those can be implemented in different ways with the, they can have basic services and they can have replications and every one of them can be implemented differently. So here's one with one, two, three, four, five. No, it's got in service I guess because it's dotted. Here's another one that's got a single server, here's one that's got two, they can be supercomputers, workstations, whatever but they're all distributed around the place and so there's a client that go into global, global will say, okay, here's the information you need, you need to go to that guy and that guy will get you to there, will switch it to there and back comes the information you get some kind of an appropriate record and you're done. So internally it's been elaborated down over some 20 odd years, it's pretty interesting, the software is all available, publicly you can download it, the only thing you need to do to make use of it except on an experimental basis is get registered in this global registry. So that was a long discussion among people from government, private industry, academia and we've been running that for 20 some odd years and we set up this foundation in Geneva and handed over that responsibility to the foundation so that's now run out of Geneva and what it does is it provides coordination, some software and other strategic services for development and the evolution of the digital object architecture and it works with different groups on its application and it has this mission to promote interoperability between different kinds of information systems. So it could be a weather system and a health system and a transportation system, a banking system and an insurance system and so forth and they can define them any way they like but this provides a uniform way of interfacing between them. This X1255 is something that a lot of them are relying because it is a standard that is now adopted globally but at a very high level, that standard supports the core DO architecture standards and the foundation kind of manages their evolution going forward and they provide overall administration of this handle system which is a particular implementation of the identifier resolution service described in the architecture. So provision of GHR services comes from an administration that is distributed with multiple administrators around the globe. So it's like an organization like the FAA that's managing the air traffic but they're not running the airplanes and so the equivalent of running the airplanes is done by these administrators and they're currently about eight of them. We hope to get to 12 very shortly. It's got a very distinguished board that's administering this from around the globe and what they do is they give credentials to the administrators and then they issue prefixes based on their credentials. Just to show you what physically happens, okay, here are a collection of global records and they're identical. Every one of the administrators keeps a copy of them but these are only the very high level ones. If you give lower level descriptions, they won't show up in the global registry. You have to go ask the local parties but let's say here's CNRI as one of the administrators and we have, so there's some party that they get a prefix. We put it in the GHR records and then we propagate it to all of the other administrators around the globe. The security in this system is particularly interesting. So here's another client, we take that in, propagate it, another client, we do the same. So here's another one, this is the DOI Foundation, is the organization set up by the publishers to deal with the DOIs. They can do the same for the different registration agents, put the information in, propagate that. Here's another one, this is GWDG is dealing with the big data and research data in Europe. I think originally sponsored by the Max Planck Society and so they have different organizations, they work actually around the globe, they'll do the same thing and so, there's another one and so forth. Donna itself, the foundation puts in certain information pertaining to security and this whole system basically has been operating now reliably for almost three years and it really solves a lot of the problems of building a big distributed database where what happens in the middle is all interconnections between the parties that we thought about using blockchain for that but decided we didn't need to because this is as effective and is much more efficient. So a lot of fostering of community interests, I just mentioned a few things, IOT, big data, authentication, interoperability but the foundation is right at the center of the coordination but it doesn't do the work, the work is done by other parties. And I think we're gonna see the internet really dealing with increased complexity elsewhere. This is one attempt to deal with a fundamental problem of information management. I think trust in the system is important. I think we'll eventually see this mobile program technology show up again but the need to protect these rights and values of interests coupled with the share volume of information is really something that requires this new paradigm. So I think this digital object architecture is really important, can do the job and there are a lot of other things in progress. I'm not going to go through every one of them but we know that things are gonna grow in many different dimensions. We're gonna have growth in terms of the number of objects, the actual amount of information, the need to rely on it, the need to have it persist. That's gonna stress almost every part of the internet as we now know it and so if we don't have a good architecture for dealing with it, I think we're gonna have trouble going forward but I think this can also benefit every organization that is willing to make the investment in managing its own information because it will stand in good stead going forward. So that's the last slide I had. On the slide packet if you take a look at it you will see there are a bunch of articles in the back that I commend to you that have to do with things that we have done or been involved with to try and explain some of this technology and I think you'll find it interesting reading in your own right. So I think I will stop there. Thank you very much for your attention. I hope you found this interesting and that nobody will ask me why we needed it on the first. So do you wanna take some questions? Give it to the mic so that people who are streaming it will also hear the question. So you mentioned about digital object identifiers which sort of work very well if things are permanent but my question is can we afford to remember everything as we get more and more data being generated by more and more devices? Can we compress this data? Can we afford to forget some of this data? Well there are two parts of it. One you could be talking about the identifiers but the information itself, that's a policy matter. That's not a technology matter. Whatever the policy is, you probably find a technology approach to managing it within limits of course. I think that the fact that there is so much information is challenging to some people who wanna keep it around forever and to other people they wanna forget it. If you talk to the lawyers they'll probably say get rid of that after a while because you never know what the downside might be for keeping it around. I come from a family where we never threw anything away so I'm inclined to wanna keep everything but that's not because the infrastructure requires it. It's just because it might be an interesting artifact about your life. There were some groups that were trying to develop lifelong histories of people and wanted to be able to create the life log of people and you only wanted the people who people would care about in the future. Who are they? How do you know about them when they're two years old? They keep a life log of everybody and then you can decide which of the one you wanna curate and which not but probably every family will have some interest in keeping their own family archives for as long as they can afford it. I didn't think this was a shy crowd. So how do you envision the interaction between DOIs and the domain name service? Is there any interaction at all? Do you imagine there could be some interactions? Well, we use domain names all the time because some people today have wanted to keep all their information on websites and so they give URLs but they're in the handle record. So if they move it from place to place they don't have to change things. Of course, if you're moving it from place to place and not changing the domain name then you don't even have to change anything in the records because the domain name does that all together. And to get back to the question that was just asked a moment before, I mean, I was the one that put in place the transition to the domain name system and we did that so people wouldn't have to remember all the IDs. You remember a simpler way of doing it. I don't expect people to remember IDs at all. I mean, this is a bigger problem than remembering IP addresses but that's where registries come in much of what the average user of this kind of capability will want to do is take a particular identifier that's shown up. It's in a journal that they got. It's in a paper that they read. It's in something that they can, they actually tangibly in hand and wanna go follow it through to get the actual information or whatever it is about it that they care about. So that would be the typical operation. People who would be more interested in delving deeper will be the research community looking for things from the distant past, right? You're a builder, developer and you're trying to put an add-on to a building that was built 50 years ago. Anyone get the plans for the building and you wanna know what laws apply then or now and you wanna put that all together and you'd rather not make it a research project if you can avoid it. And I think this technology, if it's managed properly, we'll avoid that. Everybody's gonna have it. I mean, how many buildings do you have here on campus where people decide, well, we need an extension to the building and you need to go back and find out what was in the original building and the like. What people are now thinking about and we've actually built some systems like this that they wanna know everything that's in the building. And when I say everything, I mean, not only the steel in the building and the pipes in the walls but the carpet on the floors and what paint was used and where did you get the HVAC system? What about the handles on the doors and everything in the building and you can easily imagine that can be managed as a digital object thing as if it's created when the building is created. You can find out everything about the building including the plans of the building and the approvals and the codes and all of that that apply. So this is broadly applicable and that's just one example. So do you have connections with the Samvera community? So the Fedora Open repository where it would enable preservation of digital objects and the other thing is like the DNA computing and quantum computing given that you want to store all of these digital objects. Do you have an endeavor along the lines of having these innovative means of storage? So, I mean, you could ask about the internet in general for applications of all kind and have you figured that out? We tried to keep this infrastructure at the minimum level so that when people had their applications they could build on top of it. So the short answer is we could, other people could for any particular example you give but because we didn't want to tackle every possible example including ones we couldn't think about obviously we haven't tried to do that and you mentioned Fedora in particular. Let me tell you about the history of how that came about. When we built the very first of the repositories I just described, we built it in C back then that was the language we used. And what we wanted to do was fund somebody else to build an equivalent to demonstrate that the repository access protocol which we're using at the time would enable another repository done completely differently to interoperate. And so we funded Cornell to build that version in Java and they called that repository Fedora in terms of it really came out of our work it is it was originally compatible we haven't followed it because they took their own path but there was a very close synergy there when Carla goes and Sandy Payette were there and that was the background for that. Well thank you I think this is all the time we have for questions because we have an event coming up next. So the event is at three o'clock we're going to have a panel it's on the internet present and future policy and technology issues it's in WALT building the Wilmeth Active Learning Center building room 3154. So that event starts at three o'clock and we have about 10 minutes for transition. So thank you all very much. Please join me in thanking you.