 Thanks for coming to this session. I'm Herbert van der Sompel, and I'm here with Martin Klein, who's in the back, who is on my team in Los Alamos. So we're going to brief you today on initial thinking and results of the Hyperlink project. Hyperlink is a project in which we have people from the research library at the Los Alamos National Laboratory. So that's Martin, that's Robert Sanderson, who unfortunately could not make it today and myself. And then we have people at the University of Edinburgh. There's a dean group, which you may know, they are providing a lot of electronic information services to higher education in the UK. And they're part of this project. And then we have people at the language technology group as part of the renowned informatics department at the University of Edinburgh. And all of this is funded by the Andrew Mellon Foundation. We'd like to give a couple of acknowledgments to people and institutions that are supporting us in this work other than the funding agencies. People that provide us with primary data sets and secondary data sets that we can leverage as part of this work. And then also groups that help us with technology, such as Crossref, Jeff Pilder, people like Elsevier, and then people that we liaison with. So exchange ideas with, there's also Microsoft from which we got an academic collection. Several groups that we liaison with, people at Permacy C, Michael Nelson and his group at Old Dominion University, Crossref, Internet Archive, and so on. So the project is all about reference fraud. And so the problem domain really is web-based calling communication and how it links to resources on the web. It references web resources. And there's really roughly two types of these references. One is for formal citation of other scholarly works. And basically we've seen that ever since the introduction of electronic journals, as soon as they arose, we saw links popping up next to citations. And then there's also this notion of just referencing resources in, I'm going to call it the web at large from now on. It's just things that we are not considering to be maybe at the core of the scholarly communication system. It's a project website, software, ontologies, workflows, online debates, all these kind of things that are created or used as part of a scholarship. And then as we know, I do not need to tell you that, that links to web resources are fragile. And we've introduced the term actually as part of this project that is reference fraud. And reference fraud refers to two days. One is link fraud, which is very well known, and that's the fact that a link will break and you will get a 404. And the other is content decay, which basically means that content on the web changes over time. When you reference a URI, let's say three years ago, the content that will be there today, it may very well be no longer representative of what used to be there. So you're basically referencing something, but by the time someone consumes the reference, the content is really gone. And so the combination of these two, we have turned reference fraud. So it came up with a little table to characterize this problem space, where in the rows you see link fraud and content decay. So as the problem areas, both of which together are reference fraud. And then broadly this defined the nature of resources that are referenced. So on the one hand, scholarly resources, and on the other hand, rather large resources. Now, believe me, I understand that this is not a real line between those two, and actually increasingly that line gets blurry as much more informal scholarship occurs on the web. So this whole line between formal and informal is blurring, and that's actually where this line here is to be situated a bit. So I'm going to look at the left-hand side first. So this is the real kind of scholarly resources. Let's just think about books and electronic articles and all that. And so many years ago, in order to address the link fraud problem, we've introduced DOIs and the HTTP version of DOIs to actually make those actionable. And they solved the problem of, well, when content moves from one place to another on the web, so it gets a different URI, then your DOI, your URI will still redirect to the appropriate location. When it comes to content decay, one is to say that the problem is a bit less there than in the other part of the table, because we're typically dealing with content that has a sense of fixity to it. We are used to journal articles somehow being frozen and books somehow being frozen. Okay, there might be a new version a couple months later and so on, but it is definitely not as dynamic as regular web resources. And still, even though we have rather fixed content, it could still disappear. And hence our community has set up all these kind of specialized frameworks to deal with scholarly communication and the archiving of it. Things like clocks and logs, portico at the level of the primary archiving, the archiving of the primary literature, and then things like the keepers registry to try and figure out what is being archived where. It looks like all of this is nicely under control. As a matter of fact, that's not really true. You've heard Cliff reference it. There was a session just next door earlier, and Dave Drossenthal has a fantastic blog post about how this thing that you think is under control actually is not all that much. And despite some interesting issues here. Nevertheless, this is not what the hyperlink project focuses on. We are looking at this part of the table, so we're looking at the link-rot and content decay problem at the level of resources at the web at large. And so here we have this consideration that I alluded to already before, that the type of resources we're dealing with here may not have that sense of fixity like we are used to in typical scholarly communication, but also they are not necessarily under the custodianship of people that actually really care about persistence and long-term excess. They may just be out there and we are using them and no one cares as a custodian that, you know, yeah, we should do something about that. We need to preserve that. So quite the difference with stuff that is at the core of a scholarly record. The problem is that when we reference these kind of materials and they vanish or change over time, we basically have a broken scholarly record, right? You cannot follow the reference anymore at some later point in time to see what actually the reference material really was. And so when you think about really transforming scholarly communication to a web-based environment, then a lot more materials will be in the right-hand side of the column, and hence it's really a problem that we're going to have to start solving. Some of you may have been in the audience when I did my plenary presentation in San Antonio at CNI. I use this article here as an example of what happens with references to resources in the web at large. This is a paper I wrote in 2004. It's about nine years old now. And just in this list, and I'm not going to go through it in any detail this time, but just in this list of references, you see all the kind of combinations that can happen. Like this year, right here, still does no longer exist, but it is archived. Same with this one. This one no longer exists. It's archived, no longer exists, not archived at all. So we don't even have a trace of this thing anymore in any of the web archives. This just to illustrate that the problem is real and needs some kind of solution. The Hyberlink project works in two strands with regard to this problem. There's a research strand where we try to quantify the problem. And this is a part that actually Martin will talk about. And then there's another strand where we are thinking, brainstorming and hoping to find, I won't say solutions to the Linkrod problem, but at least components that may help ameliorate the problem. We are not as arrogant as saying we're going to solve this problem. We're trying to find things that may help. In the research part, we are focusing on electronic journals because those are corpora that we can easily put our hands on and start working with, hand over to the people in Edinburgh in the Language Technology Group to have them do the beta mining. But again, if you have seen my presentation in San Antonio or if you care about looking at the video, you will know that we have a bigger picture in mind. It's not at all only about electronic journals. It's really about web-based scholarly communication in general and all these assets that are used in scholarly communication that live over the web, are dynamic, change over time, and are interdependent. And that somehow we need to be able to preserve temporal slices of what is going on there to be able to revisit the state of the scholarly communication system at certain moments in time. Is it worth our time? And is it worth Mellon's money to study this problem? Well, this little graph and this shows how articles increasingly link to web resources. And with web resources, again, I mean web-at-large resources. These are not links to things with DUIs. These are links to stuff out there. And what we see is a plot based on your rise extracted from PubMed Central papers. This is the time real 1997 to 2012 that is being depicted here where you see from hardly any reference at all in 1997 to over 140,000 references to web resources only in that corpus in 2012. So if you believe that these kind of resources are subject to the same kind of problems of reference fraud as all the rest on the web then we do have a problem that needs to be solved. And then if that still doesn't convince you the New York Times actually cares. So if that isn't an argument. The New York Times ran this story here which is actually it was written because Jonathan Zintrain and his colleagues at Harvard had put out a new study about link fraud and reference fraud in which they basically observed that the problem in legal journals was actually very severe but that it's not restricted to scholarly literature. You also find it in Supreme Court decisions where links do not work anymore or where the material at the end of the link has changed over time. I'm emphasizing this because this is not just a problem of scholarship. This is a problem in the legal domain. This is also a significant problem in Wikipedia for example where they have a special project that the Internet Archive is involved in to actually try and address this reference fraud issue. That was the introduction to it all. Martin is now going to talk about the research bit of the project and I'll be back then to talk about how we're going to solve all of this. Thanks, Herbert. So I have the honor now to give you a brief insight into preliminary results that we have extracted from our first few experiments and I get to show you some pretty graphs which is obviously a good thing. So Herbert mentioned it, I'm sure everyone is aware the problem of reference fraud in the link fraud is not a new problem. It has been studied in the lab for scholarly communication even and also for government documents as we've just seen. Everyone is probably aware of these scholarly papers that tell you about the dynamics of the web and links brought with the reporting shocking numbers of high percentages of links that are gone by now. However, what Herbert linked us differently in this context is that we're both investigating not only link fraud but also reference fraud, so basically the decay of content over time that was mentioned before. That's one new angle of the project compared to the previous link fraud studies that we've seen. We also are able, we are in a unique position to have great insight into what actually is our type of resources that are being linked to so that's another angle that we are fortunate to have. And all everyone's brain, that their data set is larger than everyone else's but we're really in a position that we can run this experiment in an unprecedented scale on several dimensions. First dimension is the number of articles that we look at. The second dimension is the number of resource, I'm sorry, archived resources that we can look at. That's a unique situation that we are in at now. And of course the number of articles hand in hand gives the number of your rights that we can actually look at. So these three columns basically are unique to our setup. To prove what I just said, a little table details don't matter in this case. What I would like you to point out is the last line which is a pilot study that we have done more than two years ago and that already shows you in terms of number of articles, roughly estimated by the time span of publications that we're using, we are well above everyone else than we are we are. The number of your rights that we looked at, we are well above everyone else thus far. And our, we're particularly proud of the number of your rights that we can actually look at in depth archives. We are well above everyone else. So that gives you an idea of, we're not only re-running experiments that others did, we're really having several dimensions that are entirely new to the domain. All right, so this is the methodology that I'd like to go over briefly now. Just as an overview, this is where we go and I'll start in the, zoom in a little bit, start in the left top corner and walk you through our methodology. So we generate, we obtain several corpora containing scholarly articles. Clearly that is the first step. We extracted all your rights from those articles, our second step, which leaves us with a list of your rights and a little bit of metadata about surrounding those your rights. For example, the journal that the URI was cited from, so the sighting journal basically, or the publication date of the article, fairly important for our purposes. And then we do a little filtering. Basically, as Herbert mentioned, we're filtering out references to scholarly works, everything that has a DOI, for example, and your rights to resources of the web at large. So to have those, the distinction would be the different. What we use, of course, for our experiment, is the set of your rights to resources of the web at large. And then a little bit more detail about the right-hand part of the methodology. So we have our filtered your rights, we have our corpus of interest, and then metadata, publication date of the article, for example, and we do a life lookup on the web today. What that means is, we just send an HTTP request against the resource and see whether it's still there in very shallow terms. If you're familiar with HTTP as a protocol, basically if it returns something that starts with a 200, status codes be considered as URI as it exists. If not, it does not exist. So the distinction between the two for any given URI of interest exists, does not exist. And then, and this comes back to what I mentioned before, our approval list situation that we're in, we also use the set of URIs to look them up in the archive. We have the Memento framework available for us to us, and that gives us a unique situation to look up archived holdings of resources across several archives, across many archives worldwide. So that's a situation that no one else that we are aware of is in, and thanks to Memento to the framework, we are able to do that. So what we're doing there by looking up whether a URI is archived or not, is at the first step, we obtain the time map of any given URI. What that is is basically a list, and this list contains URIs to archived versions of any given URI, and then the time when this archive was created, what we call the Memento data. And of course, as the first test is to see whether this list is empty or not, meaning that there are archived versions of a URI. If it's empty, of course, the URI is not archived. You can do that as a first trivial test. And then the secondary test is to go through the Memento and time map and extract the Memento, the archived version of the URI that is time-wise closest to the publication date of the article. That is our approximation of best copy, best archived version of the URI. So once we have that extracted from the time map, we again send a request against this Memento and see whether it exists or does not exist. And in this case then it's whether it's archived or not archived. So we have four cases for any given URI. It either exists today, it does not exist today, it is archived or it's not archived. So that distinction is important because the next few graphs I'm going to show you distinguish this. To come back to the overview of the methodology was these are the components that we are using for our experiment. Alright, I promised you pretty graphs and I'll deliver them. But first some bragging data. So the amount of articles that we process in this early stage of the experiment is almost 500,000 articles. We use the public central corpus, as Herbert has already indicated, from the beginning of 1997 all the way to the end of 2012. The number of almost about 30% of the articles do contain links that are of interest to us. Links to resources at the web-unlock. The total number of resources of links, rather, references is about 500,000. And if you dedupe this, you get a few less than that, of course, which just indicates that there are your rights that are referenced multiple times. So this pie chart gives you two messages. The first one is, there's good news and there's bad news. The good news is that 31% of the your rights that we've checked basically do exist in their archive. This is clearly the best case scenario, right? The light greenish slice of the pie, it's okay but it's still, it's not perfect because the your rights does not exist in the White Lab today anymore, but at least it's archive. Everything that's somewhat green on this pie is still good news. Everything that's not green is not good news because either it doesn't exist or it's not archive. So the red one, for example, but both cases are true, this is truly bad news. So that's the first message. The second message is, I need to put this in more, little or in more religious terms because the plot is a little bit deceiving. What it does not show you is the case that I've published or I've wrote and published an article 2011. I included a reference to a VAP resource and the only archived version of that VAP resource is from 2001. So as I mentioned, the proximity time wise is not great, so hence the likelihood that the archive copy with the life copy at the time I wrote the article or disjoint is fairly high. So we tried to approach this plot from a different angle and said, if we narrow our definition of archived time wise and say archive is only that URI that has been archived within 30 days of publication date, what would this pie look like then? Well, here's the answer. The good news from the bad news has numerically increased. So now we only have a total of 22% good news and everything else is bad news. So if you're now even more restricted, if I write an article on November 30th and the earliest copy I find is November 1st, well, maybe they've stole too much time that has passed, let's look at the time span of 14 or 15 days. The bad news, right? So now we have only 15% good news and you can play this game further. Seven days, one day, and one day is basically there's no good news left. So we could have animated that. Right? So the one day is if you're looking at a URI that is archived within the same day of publication of the article, basically out of line. This is one of the plots that you will see in all kinds of the web is dynamic and links right in those kind of things. Studies. So the amount in relative terms, the amount of URIs that are not existing today anymore. In relative terms. Meaning in 1997 more than 85% of resources that have linked are gone today. So it's clearly bad news and very in mind that the amount of articles that have been published since then are probably increasing as well. So 30% in 2012 are still very, very bad news. So 30% of resources of URIs linked to French scholarly articles last year are gone today. If you overlay that line you recognize the black line has not changed. It's still the same black line. We plot that together in the same canvas with resources that have been archived. The blue line shows you all URIs that are archived relative again to all resources cited. So it gives you the confirmation that we have work to do. We're not very far from done in terms of what we need to archived. And again there's clearly some noise in the early years where the lines were not in the more recent past. And the distinction between the time that is passed between publication of an article citation of a URI and archival time we narrow it down all the way down to one day which is represented by the red line you follow that it's really bad news. So there's two things that I think we can take away from this graph. One is we need to better in archiving clearly and two, if we can or our ultimate goal would be to raise not only all of the colored lines but particularly the red line because that's presumably the archived copy of an open resource that is closest to what the author intended to reference. Because it has been archived within one day of publication. So for this I'll hand over to Herbert again and he will talk over some possible solutions of the problem. Thanks a lot Martin. So this slide says solving reference wrong although it should say ameliorating. It didn't fit on the slide in that font size. So my apologies for that. Again we're not that arrogant. So we're coming back to this table here and I'm going to talk about the aspect of content decay only the tiny bit and I'm going to focus most actually on building problems. So first of all again this observation that with this web at large content we cannot count on the fact that there is any kind of fixity. So we're kind of down to we'll have to take snapshots of that content as it evolves over time if we do want to revisit the content by means of references. So this bit that I'm talking about. And in essence the message here is somehow we'll need to do better at proactively taking snapshots of things that are being referenced or are likely to be referenced in scholarly communication and we can see two components to a solution. First of all is at the level of those kind of resources where you think they may be subject to referencing in scholarly communication. If you have a project website of a multimillion dollar project then most likely you're going to get some references in papers. So how about your archives stuff? So you do that either by running a content management system with a good versioning system so that you automatically have good snapshots over time you run a beta wiki and so on you subscribe to on demand self web archiving services like archive it or you run a transactional web archive like the side story solution that we created. So by which I mean that if you somehow are involved in scholarship yourself with your project then there's a responsibility to make sure that your web presence remains available over time also. And then for the rest we can obviously web archive resources on demand for example the author as he creates or as he creates a manuscript can archive things in these on demand web archives I think the internet archive just launched a service like that web citation in that real for quite a long time there is archive his so there's quite a number of offerings in this real and then with the people at Edinburgh we're also brainstorming about intervening in the manuscript submission process for example the manuscript is submitted at that point you scan the submission for your eyes you filter them out with the UI you don't have to worry about it's all rosy there so let's look at the others and let's archive that stuff proactively so that we have these snapshots as they were intended actually that's basically the only thing I'm going to say about that this is going to be explored further in future stages of the project and I actually want to I want to focus on this bit here the link wrong bit and this is an area in which I have been surprised myself of finding a problem that I was not aware of and as you probably know we've been working on the mental for four years we think a lot about web archiving and so on and suddenly a problem hit us related to referencing archived resources and I'm going to introduce it to you so I'm coming back to the New York Times article here and so here is a link to the study well it's actually to the blog post that Jonathan C. Train did about the study so I don't know whether you see it in the back but there's a URI behind that link here that points to a wall at the wall library of Harvard and so obviously when I click this I end up at the blog because you know there's no 404 there yet it's still alive nothing surprising here now I take that URI and I go to the way back machine and I search for the URI and indeed I find two copies actually one from the day of publication of the New York Times article and then another two weeks later so and I do the same thing I go to archive.is and I search the URI of the blog post and indeed I also find an archived copy so all is good there's a couple of archived copies of the blog post out there now I scroll down in this New York Times article and down here the article talks about the Purva CC solution which is one of these on-demand archiving approaches so basically what happens someone wants to preserve a snapshot of a certain web resource you go in there you submit to URI snapshot is taken it's put in your archive and in response you get a new URI back right and that URI is what we now call a new permanent link to that archived state so behind this link here the new permanent link is actually the Purva CC by the way this is not at all about Purva CC or anything wrong with Purva CC this is how all these services actually so we have a new URI here and now we're going to do the same thing right we're going to click it and we end up in Purva CC where indeed you see the archived snapshot and now I'm going to the internet archive again and I search that URI and I don't find anything although I know that the thing is archived there and I go to archive.is and I don't find anything although I know because I've shown you earlier that it's archived there so what happened here the good news was that we have another snapshot of that blog post in an archive that's great we now have more and more is good lots of copies keep stuff safe and all that the bad news is that by using that new URI there we've actually undermined the possibility of finding an archived copy of that thing in other web archives okay basically why was that because we have replaced the original URI URI of the blog post with the URI that was given to us by web archive and so basically we've painted us in the corner because now access to that URI of that archived copy by putting that URI of that archived copy in there in the link we are now 100% dependent on the permanent existence of that one archive to be able to access that archived copy I can no longer use that URI to find it in any of these other archives so basically I've replaced one link route problem with another that even web archives may not be forever and that this is really a problem that we should care about this snapshot of websiteation.org which actually was the first as far as I know to be in this game of archiving web resources used in scholarly communication it's been around for many years and at one point earlier this year announced that it was running out of money and it started a fundraiser initially asking for $50,000 to continue its services then decreasing to $25,000 and last time we lived they only had made $12,000 or so and we do not know what the current situation is problem I think you get it even these kind of organizations suffer of the same kind of constraints of many others here's another instance of the problem this is a blog post I wrote with Michael Nelson from Old Dominion at the occasion of the conservative party in the UK taking down basically a large portion of its website to hide the speeches of Cameroon because there were speeches about new thinking about openness of government and how the internet was going to change all of that and they were very embarrassed of ever having said any of that so they just wanted to hide it obviously there are copies of these things in web archives and so the problem that this blog post tries to reveal is that although the internet archive held copies of basically all of those speeches because of a policy that the internet archive has they were not able to show them to end users the policy is a technical one I do not need to go into any details the point is here that these web archives are subject to policies each archive is subject to other policies subject to other kind of legislations fortunately during this period of time other web archives had copies that were still accessible because they did not have exactly the same policies as the internet archive point I'm trying to make you want multiple archives to leverage so that you know that at any moment in time you can at least recover something third thing, I'm sorry it's again about the internet archive just to illustrate that disasters can happen nothing bad happened at all in this fire to the archival capabilities at the internet archive but just a reminder that all organizations are subject to disaster also all of this to say that when we reference the archive web materials we need an approach where we can leverage all the web archives around the world not just a single one we should not paint ourselves in the corner and think that well this one is going to be there forever so that's good we cannot accept a stove pipe kind of now since the original URI in the example here URI of the blog post at Harvard is a key into all the web archives my point here is that the way that we link to archive materials should necessarily also include that URI so we should not throw the key away when we link to that we should actually keep it so we need two URIs one is the region URI and one is the URI of the momentum of the archive copy now that's a problem of course right because a link an anchor element in HTML only allows for one URI and hence it's really understandable that the way we currently link to archive material is by means of the memento URI it's totally understandable I'm just trying to say it's not right there is something broken here in the infrastructure that I think is rather significant and that requires the attention of this community basically solving a linkrot problem with an approach that itself is subject to linkrot that's probably not really great so we have a proposal and this is really a brainstorming kind of proposal we are not claiming that we have the solution there's a document out there that is called the missing link document and basically we're saying how about we extend the link so we add additional information to the link to allow us to also go back into archives go back into time and so extend the link to the original resource so you keep the original link in there with temporal context URI at the memento if you have one and then several dates that you could actually use the date of the page that contains the link so if I would note the publication date of a paper and say hey the author this was published around that time so probably I would like to look at that link as it was at that moment in time of publication or at the final granularity the date of the link itself and this is very similar to how we now form a website where we say access at a certain date but here we are advocating to providing that information in a machine actionable way in the HTML on the links and so on so that machine agents can actually and user agents can act on that information this slides provides a summary of that proposal and so basically here you see a link element the anchor where the content of hre still would refer to the original version the original URI and at one time you could keep following that that is also the way that this resource is known throughout the web by means of its original URI and then in addition to that you introduce a notion of a version URL so that's the memento URL and a version date the date of archiving for example with this here you go to the current version of the resource with the original URI and the version date you could use memento or a web API to go to a version of the resource with an unknown version URI and with this version URL you could go to for example the permacisi copy and then up here it's actually from a schema.org where you could use the publication date of the page and again use that under memento protocol or under web API web archive API to access a certain archived copy so the question here is of course if you make that happen the approach that is currently taken just put to URI of the memento in the link works out of the box that's why people have been doing it but I hope I've shown you that there's a problem with doing it we paint ourselves in the corner of one archive the proposal that we are making which again is a conceptual kind of proposal requires infrastructure change but it does contribute in general to web persistence so it does something for us in the long term so we need changes that allow more HTML and we need changes the browsers to actually leverage those kind of new attributes we need similar kind of changes in all kind of tools there are possibilities to make that happen and as a matter of fact I found out that in 1995 the HTML actually had a special purpose attribute to deal with issues of web persistence it allowed you to put a URL in there you know this is now long gone and actually HTML5 even emphasizes that this is deprecated and all that and kind of trying to come to the conclusion here that we should probably revise this and have a conversation between our community and maybe the web technical community to figure out what can be done so basically that concludes our presentation we've gone, well Martin has given you preliminary results of the research bit of Hiberlink I've addressed our thinking about these problems, Linkdrop and Content Decay for references to the web at large and then I do have a little demonstration so this is actually a demo by means of screenshot but I've learned my lesson right, so we're going to go back to New York Times article oh, a little plug for Memento okay, we have a Memento for Chrome extension that's actually a real beauty, please install it and it's a shortened view right so it's more like do as we say not as we do please tell your friends and family and colleagues about this because it's a true it's really remarkably beautiful anyhow so we've created an experimental version of Memento for Chrome that can use all that temporal context that I talked about earlier to get to stuff so this is again the New York Times article first observation the New York Times many other venues actually already have this information in there the publication date of a new story to be also founded in the Washington Post for example this is Schema.org stuff so that's already there nothing needs to be done here and remember this link here that was to the blog post and we changed that to include that temporal context so I now have a link that contains still that URI to the blog post and it has now a special attribute here a version URL and that has the purpose CC link in it so I did that little editing manually of course just for the purpose of the demo so I'm going to have to read this all for you because you don't see it the paradigm that the Memento extension for Chrome uses you right click and then these temporal options become so we right click on this link and here is the Memento menu item now and the first slide that you see here is related to a user setting a calendar date so that's always available the user wants to see something at a certain calendar date can select it there but here that also get near the current time that will give you the most recently archived copy we have get at page date from purpose CC those are new options that are available so I think here I'm choosing get near the current time and there I only use the original URI and I find the archived version in archive.is I'm going back and here I'm now using the original URI and the datetime that was provided for the page so the page publication date this option says get at page date and now we're using the Memento protocol with that date and we end up at the version that we saw before from the internet archive exactly archived from the date that I asked for and then this option here get from purpose CC and that obviously uses that data version URL thing that we put in there and when I select that I'm going directly to the copy in permanent CC remember that link that was basically overwritten with the purpose CC link that was the only one here but that's now exactly the same as before as the prior link it links directly to the URI the original URI of the blog post and in this attribute it has the purpose CC thing and so everything that we saw before is available here also so now I'm going to the blog post itself there's some really interesting stuff here so this page did not have the publication date as metadata so I actually manually added it and here this is really cool so there's a pointer here it says Wiki discusses linkrot here so basically it points into Wikipedia to the linkrot page basically the topic about linkrot now see that this page was published on September 22nd and this was actually before permanent CC was really announced a really active so I click I obviously get this is a regular click I get the current version of that page and I did this a couple days ago I scroll down to be able to read it but there's an entry here that talks about permanent CC the link was put in there prior to permanent CC actually really existing I get the page that talks about permanent CC and this is an example of forward content decay you could say this is not what the author saw right so here's what we're going to do now I go in back here and now I'm going to follow this link subject to the publication date of the page remember the one I put in the metadata and now we're going to in this case use the momentum protocol to use it originally arrive in Wikipedia for the topic and the page publication date and we're actually going to arrive at the version of the Wikipedia page that was active at that very moment in time here we are this is an older version of the page you know pre-September that was the one it's an old version and I scroll down and now the permanent CC thing is not there so basically we are now seeing exactly what the author saw when the author was referencing this one more little trick here there's a link here to the market library innovation lab that is actually the force behind permanent CC and so this is the original link library lab law Harvard and so on and just for fun I added a daytime to that link something like well I accessed this and I live of course and I put the date in 2010 in this page and so well if I click this link as usual this is the page the current page basically that I receive my right click on the page and at this option get at page date and that means that we're going to get something from some archive and this is from archive of this now the page date was September 22nd and the best that we get is June 21st so this is one of these examples where there's not a great coverage of archive material we don't have a lot of snapshots then I'm saying get that link date and remember that's the date that I faked in there the 2010 date and again we used the memento protocol and we arrived at the version actually very close to what I was asking for September 18th 2010 from the internet archive so basically the bottom line is by adding this little information that we have really available at the moment that we reference and so a link can not only lead to the current representation of a resource but it can basically lead to many different types and many different archives and I really understand that we need some kind of a change in the infrastructure to make that happen but I think this is better than painting yourself as David said also in the corner of one kind of dependency on the survival of one instance, one archive that's what we have to say thanks a lot