 Thanks for coming to this session on Resource Sync. I'm here with Martin Klein, who works with me at the Los Alamos National Laboratory. So Resource Sync is a project that ran over the past two years. It's a collaboration between NISO and the Open Archives Initiative. It was funded by the Sloan Foundation and the contributions from JISC also. And Resource Sync is all about devising a specification for web-based synchronization of resources. And of course, we'll explain to you what that exactly means and what framework looks like that came out of this effort. So this is an overview of the presentation. I'll start with problem domain, scope, overview of the framework, and then Martin will kick in because this is where the XML will start showing up. And so he will do technology. We actually have a little demonstration that I've prepared for you also about a particular capability of Resource Sync related to sending out change notifications. So Martin is going to do that. For this audience, it's probably easiest to sketch what Resource Sync is about by going back to the protocol for metadata harvesting. That most of you, I think, are familiar with. PMH was all about recurrent exchange of metadata between what we used to call a data provider and a service provider that would do something meaningful with the metadata exposed by repositories. Important note to make is that that protocol was devised in the days that a lot of repositories only had metadata and no full content yet. The other were these days in the past. And so PMH was all about exchange of XML metadata. A repository-centric design in the sense that the architecture of the World Wide Web was not very well known yet in those days. This was before the REST principles had been formulated and you see all of that throughout the protocol. So it was devised in 99-2002 and has been used globally ever since. So contrast that with a Resource Sync that's clearly a similarity in the problem domain because it's also about recurrent exchange of information. But here we talk about synchronization of web resources. So basically anything that has an HTTP override and has a representation. And you wanna synchronize those between what we will call a source and destinations. And so again, everything here is about web resources, things with HTTP override. This is entirely resource-centric so it's totally based on web architecture principles, the key ingredients of web interoperability and then we leverage actually some of the notions, existing notions of search engine optimization and Martin will talk about that. So here's the abstract problem statement. There's a source, okay, it's a server and the source has resources, so things with HTTP or RIS that change over time. They get created, modified, deleted. And then there's destination servers that want to do something with the source's resources. Want to build a search engine maybe, want to preserve the material that is available at the source and so on. And so as the resources change at the end of the source, the destination wants to keep track of the ongoing changes. So it wants to remain synchronized, basically. We have this kind of depiction that will come back throughout the presentation. So let's presume this is a source and it has at a certain moment in time these three resources. So A, B and C are HTTP or RIS. And so these resources start to evolve. So resource A gets updated, then B gets updated. Now at the same time A and B get updated. Oh, a new resource is created and then there's one deleted and there you go, yet another one is being updated. And so the whole notion is when we are a destination that is interested in these resources of the source, how do we keep track of what is going on there? And how do we do that in a way that is better than recurrently pulling all these resources to see whether they have changed? As soon as you have a sizable collection of resources that just not a scalable proposition anymore, you cannot continue through the HTTP heads to see whether these resources have changed and then a newly created one you would not even be able to detect, right? Because you don't know where to go and actually pull. So that's the problem domain really. And so the goal of the effort was to design an approach for resource synchronization. It is fully aligned with the web architecture and that stands a chance of being developed and implemented in this kind of community but also we were hoping that we could devise a solution that had a broader impact. What I just said relates actually to some of the choices of technologies we made as Mark will describe. Scoping wise, we were actually rather ambitious in the kind of cases we wanted to be able to cover. And that goes actually in several dimensions. So there's a few parameters at the side of the source and the destination and then there's also classes of use cases that I will cover. So first of all, the size of the sources collection. We wanted a solution that works for really small resource collections. Little museum site maybe with a couple of objects. Up to really large repositories of publishers like Elsevier's millions and millions of resources. Change frequency. Again, we have a range there of very slowly changing resources maybe on a weekly basis on a monthly basis. And on the other end of the spectrum, extremely rapidly changing things like Wikipedia or DDP where we measure basically changes of about two per second. So we want to be able to cover that also. At the end of the destination, does the destination care about latency? Does it want to be in the second aligned with what goes on at the source? Maybe because it wants to do real-time visualization of changing data sets out there or is it okay to be behind the curve? For example, if you are going to implement discovery services, it's not necessary that you're really in the second aligned with a collection out there. So again, we want you to be able to cover both. And then coverage of the resources. Does the destination need to cover all of the sources resources? Or is it acceptable to only have a few, you know, miss a few and just, you know, be more or less synchronized, but you know. So how correct is the coverage to be? Let me give you an example. If we talk about the digital preservation scenario, then most likely you really want everything. But again, in the discovery scenario, maybe it doesn't matter that much that you miss out of three or four PDFs out of a couple thousand, right? Bitstream accuracy, again, the digital preservation use case is an interesting one. Let's say a destination is in charge of preserving someone's PDF collection. Now, in its preservation case, the destination really wants to know that it indeed collected the accurate bitstream. So we're talking about content-based hashes and stuff like that, right? Where again, in the discovery scenario, it doesn't matter. I wrote the file across, count index it, well, so what? So again, those are different types of requirements. Down to the classes of use cases now that we wanted to cover. Here is a source with its resources. And this is the notion of one-to-one synchronization. This is something that exists in almost all institutions, where you need to move content from one location to another to do added value services, maybe. So this is a classic basic use case. One-to-many, so there's a master copy. This is like the physics archive and its mirrors around the world. So this is a use case that we wanted to be able to cover and actually signal Warner is now implementing resource sync for the archive at Cornell. All the way around. Very common use case is the aggregator one. So let's take Digital Public Library of America, Europeana Core, where there's multiple repositories and the content needs to be brought to central location. So that's a use case that we want to cover. Selective synchronization, different types of resources at the source, videos, text, images, who knows. And the destination is only interested in videos. Okay, that's another case. And then of course, XML metadata harvesting as specified by the protocol for metadata harvesting, it's just a special case because in this world view, XML, it's just resources with URIs. You can reference URI and you get the XML metadata back. So it fits in the picture of resource sync also. Overview of the framework. So we've seen this before, this is the source and the sources resources are evolving over time. So what resource sync is about is the pressure really, what can the source do to make it easy for destinations to remain in sync with the evolution of its resources? And in order to answer that question, it's good to think from the perspective of the destination. So the destination is going to look at the source out there and say, okay, I want to keep in touch. I want to remain synchronized. There's really three basic requirements that need to be covered. First one, baseline synchronization. This means the destination is not at all in sync as none of the sources resources. And says, I want to do catch up operation, an initial throughput of information so that I'm in sync for the first time with the source. Once that is done, we now want incremental synchronization. This means as time goes by and as the sources resources evolve, the destination wants to co-evolve. It wants to remain in sync. Third one, audit. Basically the question, well, I think I am in sync as the destination with the source, but am I? And there's really two dimensions in that. One is the coverage notion that we talked about. Do I have all of the sources resources? And the other is accuracy. Do I actually have the right bit streams or did something go wrong in the transportation of the bit streams? These are the requirements that are being met by the capabilities that are being introduced in resourcing. So they all talk to these requirements of a destination. So here we go. In this slide, you basically have the essence of the entire solution of resourcing, which is very simple. There's the notion of a source publishing an inventory. This is what I have. These are my resources. No rocket science here, and obviously an inventory will be a list of your eyes and then some, okay? Publish changes. I'm now going to put out as a source a document that is going to talk about all the changes that happened in a certain temporal interval between time zero and time one. These are the changes that occurred to my resources. Third one, notifications about changes, not wait an interval and then say what changed as the change occurs. I'm going to let you know, okay? And then for all of these cases, there's a notion of the communication payload. What is it that the source is going to communicate? Well, again, minimally the URI, but maybe for certain use cases more, for example, content-based hash, right? Because you want the destination to be able to verify bit stream accuracy, okay? And I'll come to all of that. So first component, and we call that the capability in resource sync for a source to implement is this thing that we call a resource list. And a resource list is really the inventory of the source, okay? And it is a snapshot view at a certain moment in time. This is what I have. Here are my URIs. And so the process that happens here, somehow the destination discovers where that resource list is, pulls the resource list in, now looks at all the URIs and one by one goes and collects the resources with those URIs. It's very simple. Source publishes the document that we call a resource list. Document is a list of URIs. And then the destination just goes and collects all these URIs. There's an optimization to that that I will not discuss in any detail, but basically it consists of the source wrapping up bit streams of its representations in the zip file and make it available and publishing a document where those zip files are available, okay? That just an optimization. It means that the destination now does not have to go one by one and dereference all these HTTP URIs. Again, I'm not going to talk any further about that. So here we are. Remember this was our little scenario of how things change at the source. So basically here we are. The source says at TX, I'm going to publish a resource list. And so what is going to be in that resource list? Well, it is the state of its resources at time X that means it's this one, it's this one, it's this one, and not this one because this one was deleted. So I'm going to talk about the deleted one and then inventory, all right? So that's basically the resource list of this source is these three URIs. Next up is the change list. So again, the change list now talks about change events that occurred in the temporal interval decided by the source, okay? How long that interval is, how short that interval is. So it fits in to the notion of incremental synchronization now, okay? In the change list, you always of course have the URI of the change resource, but you also have the daytime of the change and you then also have the notion of what happened to that resource? Was it created, updated, or deleted, okay? And so in the same kind of way, this is just a document that the source publishes it there. The destination finds that document, de-references it, sees all the URIs listed in it, and goes after the representations. And then for the created and updated one, yeah, it gets those representations for the deleted ones. If it has a copy of the deleted ones, it removes it from its collection. Again, there's an optimization here, like the zip thing that I talked about, but no details. It's all in the spec. So let's look at this now. Again, we have evolution of the sources resources here, and now we say, oh, the source is going to publish a change list with all the changes that occurred between TY and TC here. Well, so what is it going to contain? Well, A and B, they were updated at time C. This was created at time D. This one was deleted, and this one was updated. So pay close attention here. The same resource will occur twice in the change list, because in the interval, it changed twice. And the change list notifies about every single change to a resource. It's actually quite important in a certain use case. So down here, you see what the change list is between TY and TC. Is it with me? Let's say it will wait. Now change notification. So resource list and change list are all about publishing a document. And then the destination find it that document and acting upon the information in that published document. Now, if we have a use case where the destination requires really low latency, then we're not going to do that published document thing because the destination doesn't know how frequently to pull, how frequently changes are going to happen. This is where we can send notifications out. So this is where a destination will subscribe to a published subscribe mechanism, and the source is actually, as things change, going to push information about that change out. Okay. So the information in there is the same kind of information as in the change list. URI, daytime, and nature of the change. And in this case, the destination doesn't have to go collect the document at all. It just receives the push notification, okay? So here's the example again. S, A, changes. We're sending out a change notification. A little ball of information that says this resource changed at this moment. Time goes by. I'm sending another notification out that talks about this updated resource. Here I'm sending a notification out about two resources that changed, right? At the same moment in time, created, deleted, updated, you catch my break, right? So again, this is push technology. This is not a poll. Give you an overview of the three kind of communications that we can have, resource list, inventory, change list, changes that occurred over a period of time, change notifications, as things change, I'm letting you know, okay? Now we come to the payload. I already mentioned minimally there's going to be a URI in there. Otherwise we're talking about nothing, right? In addition, when it's about changes, we also have the data of the change and we also have the nature of the change, created, updated, deleted. But more information can be added to the payload in order to cover certain requirements of the destination. For example, remember the content-based hash, the audit capability? Well, the source can provide metadata about the resource. For example, the content encoding, content length, mind type, content-based hash. In addition, and it's very webby, we can also provide links that pertain to the resource. And I'll give you a couple of examples later, but for example, linking to mirror copies, alternate representations, versions of the resource, interconnecting metadata and content. I'll talk about that in detail. So the specification gives a couple of examples of how you would like to use links. But basically every registered link type that is in the IANA link registry, you could use for your own use case. I see that Tim is there. So collection membership is one, for example, if a resource is part of an ORE aggregation, that's what you would use. You would say, well, this resource with this URI, link to the aggregation with the appropriate relation type. So first example is about metadata, and I touched upon it already. This is about the content-based hash. So in order to meet destination's need for audit, the source can provide the content-based hash as metadata, okay? So source computes the content-based hash, puts it in the payload, you know? The destination obtains the payload, gets the resource, computes the hash, and compares, okay? It's as simple as that, but now we are able to audit what we brought back from the source. Two examples of link. First of all, there's this notion of metadata and content that the metadata is about. In the protocol for metadata harvesting, it was always only about the XML, right? Here we basically say, well, an XML metadata record or whatever type of metadata record and content, they're just things that reside at the URI, and they can devolve at their own pace, you know? But it would be interesting to know that this metadata is about that content, you know? And that content relates to that metadata, and this is something we would do again with links. So for example, if we have a metadata record at a certain URI and it devolves over time, in the payload, when you talk about that resource, you would point to the PDF file, let's say that the metadata describes, and it would be metadata describes, and then the URI of the PDF file, for example. And the other way around, you could also point from the PDF file to the metadata that describes the PDF file. But again, in this world view, these are just resources that live their own life on the web, but they can interlink with an appropriate link relation type. Here's a very interesting one that we actually put in on demand by communities that deal with big data sets, huge images and so on, where whenever the resource changes and the resource now is a couple of gigabytes big, you don't want to send the entire file across the wire time and again. So there's this notion, this hook that we built into the protocol, where you can basically link to a diff between the new version and the prior version. And in this case, the destination can only bring across the diff and apply the patch to the version that it already holds. Now there's a caveat here, these kind of, so this is expressed by means of mind types, you know, so the type of the diff that goes with the mind type, there's not too many mind types defined so far. There's something for XML, for JSON, but you can define your own. I mean, there's always the X vendor space and in which you can define your own diff kind of formats. To wrap this up, a few additional characteristics of the framework that we devised. First of all, it's modular. So I talked about these several kind of capabilities like resource list and change list and the dumps and all these kind of things. The source doesn't have to implement all of that. It's really modular, it's like Lego bricks. It's going to say, well, because I want to support this kind of use case, I'm going to, for example, implement resource list and change notifications. That's going to be it. That's my resource in implementation. So this is not like a standard way to say, I have to implement all of this. No, you're going to select which models are important for your community, and that cater to the requirements that you have regarding latency, for example, coverage, audit, and so on. Just like we had sets in the protocol for metadata harvesting, there's a notion of sets of resources here also. This caters to the notion of selective synchronization. I only want your videos, okay? So basically, a server can implement different resource sync implementations for different sets of resources. It's all in the specs. And then there's, of course, a notion of discovery. How does a destination find out whether and how a source supports resource sync? And again, we use webby kind of mechanisms for discovery, like robots.txt, well-known URI, and links, okay? To basically lead the destination to a description of a sources implementation of resource sync. And with that, I think we've come to the XML, right, Mark? Mm-hmm, yeah, yeah, the fun part starts. All right, thanks, Herbert. So I'd like to briefly, in the next 10, 15 minutes or so, give you a little bit more insight into the technology behind resource sync. And for one, we'll talk about the technology part of the framework. As Herbert mentioned, we've prepared a little demo in terms of a video that I'd like to show you as well afterwards. And then some concluding remarks in terms of where we're at with the specification and so on and so forth. So for the technology, there's some two points that I'd like to stress. The first one is the, in terms of serialization, that resource sync builds on sitemaps. And sitemaps, as you, I'm sure, know, is supported and was introduced by, you know, all major search engines, certainly the big three. So that's a technology that's widely adopted and that plays into one of our objectives, that we try to come up with a framework that has a rather low level of adoption there, right? So if an institution has a framework in place that already generates sitemaps, the step on top of that to be resource and compliant is fairly small, and that was the goal there. One strong argument in favor of sitemaps. And another strong argument in favor of sitemaps is the similarity between purposes, right? What is a sitemap used for? Well, sources service use sitemaps to advertise their resources towards search engines granted, but still to advertise their resources. And if you compare that with the purpose of a resource list, it's an inventory, right? I advertise my resources. So there's a strong level of similarity there, and logically, from our perspective, makes perfect sense to build on top of sitemaps. Of course, the protocol was not made for us, so we had to come up with some enhancements, which, by the way, are all fine with Google and the like we checked. So it's not that if you implement Research Sync, you violate Google policy or something, that's not the case. And another beautiful, from my point of view, aspect of Research Sync is that we're using and reusing the sitemap protocol from throughout the entire framework. So whether it's a resource list or a change list or a change notification, it's all based on the same format, which makes perfect sense in terms of it's fairly easy to comprehend, for one, and for two, it's very friendly for developers, right? Because the level of reusability of a code is fairly high. So you'll get plenty of thanks from your developers. So in case you don't know exactly what a sitemap looks like or haven't seen it in a while, let me remind you this is the very raw structure of a sitemap. It's usually an XML document. It starts with the root element URL set, and it closes. And for each resource described in a sitemap, you have an URL block, opening URL, closing URL. Within that URL block, you have a lock element, which is also mandatory. And that lock elements holds the actual URI of the resource. So these three elements are mandatory, URL set, URL, and lock. And then often it has last month to indicate, oh, this is the time the resource has last changed, but that's not mandatory. It's optional, nevertheless, very frequently used. So then the URL block is to be repeated for different resources in one sitemap, okay? So you have multiple URL blocks in the same sitemap, one for each resource. So what we're doing with this very simple URL structure is we're putting some enhancements in for resource sync. The first one is an element that we call MD, so it's for metadata, and that has a very important attribute which means capability. And with that attribute, with the value of that attribute, a destination is able to distinguish how what am I dealing with? Am I dealing with a resource list, like in this case, or maybe am I dealing with a change list? What kind of document is that that I just discovered? In this case, again, it's a resource list, and it has another attribute at the date time, which indicates the time at which this resource list has been created. It's the snapshot idea, right? At time X, these are my resources that are subject to synchronization via the resource and framework. And then we are, of course, using the URL block structure as well to describe the resources that are subject to synchronization. We have our mandatory lock element, of course, because we need to communicate URIs. We have a last mod, and we have our metadata element, our MD element in our own namespace, by the way, again, which holds, whatever it mentioned before, the optional metadata attributes of that resource that is subject to synchronization. So the resource described with that URI has an MD5 in this case, a content-based hash of this. It has a content length of that, and, oh, by the way, it has a MIME type application PDF. So that's one of the two additional elements that we introduced for the sitemap format to convey additional information. So that's the resource list part. And you will recognize another attribute change from what Herbert mentioned earlier for a change list. We not only need to convey the URI of the resource that is changed, but also what type of change has it on a gun. In this case, this URI has been updated at this time. And again, to roll back a little bit, we have our metadata element up here as well, but the attribute capability that lets us know that this document is in fact a change list. And you'll also see another difference. It does not have the attribute anymore. It has the from and until attribute. And you'll recognize this is the temporal boundaries. These are the temporal boundaries of the interval that the change list covers in terms of changes to resources. So between time zero and time one, those are the resources that have changed, covered in this change list. I mentioned we introduced two new elements. The first is MD, metadata. The second one is LN, since for Ling, if you're familiar with the Unix environment, you'll appreciate that. So our Ling element, and Herbert mentioned it, allows us to reference two related resources, let's say. So in this case, we give the notion that the resource described with the URI in the lock is described by this other resource which happened to be metadata about the resource. So with the type, with the relation type, link relation type described by, we can connect the two elements, resources of the content resource and the metadata resource. In this case, described by. And if you're familiar with those link types, you have describes which would describe the inverse case. If you're locked for describe the metadata element and your link would point to the actual content resource. All right, so we covered resource list, we covered change list, and of course, the third capability that we're covering here is the change notification. And as promised, it looks very, very similar to what I just showed you. And with one special feature that I'd like to point out here, it is yet another Ling. And it's one of the few cases where we actually have a diff more or less official. So in this case, we see that our described resource is of type application JSON. And we link to a diff. So in case the destination now knows A what JSON is, B what a JSON patch mime type is, it could just take or obtain the patch and apply it to its copy of that resource. But then of course, retries a certain title coupling between destination resource in terms of the destination source, I'm sorry, in terms of the source need to be able to support this diff, this patching mechanism, and the destination needs to know what that is, how to interpret this diff and how to apply it to the original resource. So that was the first part of the technology section in resource thing. And the second is the protocol, the push protocol that we are using. Popsup hubbub. Has anyone here in the room used, heard of, read about Popsup hubbub before? Yeah, couple? All right, so it has its route in syndicating atom and RSS feeds. However, it forked off from there a little bit and now it's more open to non-atom and non-RSS feeds also. So we're with this push approach for this push technology, applying a novel level to it in terms of, as Herbert mentioned earlier, enabling a source to push change notifications to destinations, to registered destinations. Registered in terms of a destination subscribes to a certain source in its changing resources. And so that was said before. I think it's most intuitive to describe this, the overall architecture of push of Popsup hubbub in terms of this infrastructure picture. So we have our three components. We have our source, we have our destination. We've talked about those two components all afternoon. And now we need to introduce a new component in the middle, so to speak, the hub. So the process is as follows. Destination subscribes to the source and it's changing resources through the hub, via the hub. The hub is the middleman, the hub is the one that maintains all subscriptions. The hub knows its source and it knows its destinations. Upon a change of a resource and upon the generation of a change notification at the source, the source sends out that change notification to the hub. That's all the source does, just sending out the change notification. The source does not care who's listening, who the destinations are. It does not need to care because the hub cares. So it pushes the change notification to the hub and then the hub basically fans out. The hub knows about all subscriptions and the fans out to the destinations that are interested in those change notifications. So we have the little middleman in between, okay? Does it make sense so far? Because we have a little demonstration for you and to visualize or to give you a little peek and preview of what's going to happen in the demonstration, let me show you this. So we have our overview of our source with a few resources and I will show you three things. I will show you the source and its state in terms of a directory listing. I will show you the destination and its state in terms of a directory listing and I will show you what I call a listener. So a middleman that listens to the traffic that's going on in the source, at the hub and at the destination. All right, so we'll know exactly where the information comes from, where it goes through and where it ends up and what the information actually is. So there's three things. After showing you the state of the source and the destination, I will trigger a process that creates new resources at the source. So the source starts off with the resource A, then I hit a button and then the source at the source will create three, four, five, six more resources. Those trigger change notifications from the source to the hub. The hub fans out those change notifications to the subscribers, to the destinations and the demo, it's only one. And upon receipt of those change notifications, the destination will obtain those resources from the source with HTTP yet. And at some point, the destination and the source will be in sync. I will show you that. Then I have another magic button that triggers another process that deletes those newly created resources on the source. Then again, I will show you the traffic that is flowing from source through hub to destination. And after deletion and the corresponding change notifications, the destination will be left with only one resource, the initial resource A that has not been deleted. Okay? So we spin in a circle a little bit. Right. All right, here we go. So you have three points of observations, as I mentioned. I will show you the initial state. Both source and destination have one resource, resource A. We do some magic with creating of resources and the synchronization process is ongoing. After which, I will show you the state of the source and the destination with all resources synchronized. I will delete those resources and I will show you the final state where again, source and destination are synchronized but the resources have been deleted. Okay? Let's see if that works. Oh, okay. I realized I tried very hard to increase the font as much as I could as in a final line between readability and line breaks and things like that. So for you guys in the back, I apologize it might be too small in terms of font size but I'm happy to replay the demo later on in the break and show this on the screen here if you can't read. However, so this demo has been created in the course of the last couple of weeks in the prototyping team at Manor, as mentioned. Hari R. Shankar is the main developer of this so he does also credit. And as you'll see in the top here, I have basically four tabs where the first one is just this one, the little introductory screen and then I have three tabs which I'll jump back and forth on. The first one you will see is the source, our publisher. The second one is the listener that I mentioned that displays the messages that go through the entire framework in our demo and the last one over here is our subscriber in push terms and the destination in resource terms. It's very, I will show you the states. All right, let's start this. There's no audio in the video so I'll voice over. So this is our publisher, our source, posted on a server that we have control over. It has one resource about dash en dot html and it's an exempt of one of our publications about resource sync and it tells you that the web is highly dynamic. Some text, okay? It's a regular directory listings. You have these two magic buttons that I will use in a second and that is our listener. So on the left-hand column, you'll see all messages that come from the source. In the center, you'll see all messages that go through the hub and on the right, all messages that arrive at the destination. The last tab on the right is our destination. It is in sync right now. It has one resource about dash en dot html. With a very similar directory listing structure. It's about resource sync and the web is still highly dynamic. All right, I go back to the source and I'll trigger the process of creating resources. After clicking the button, I will immediately go to the listener in order to show you the messages that go through the system. I click the button and I'll see a couple of messages coming in already. The first line in bold is the URI of the newly created resource and in case you can't read it, for example, it's about dash fr, about dash de. So different resources, you see messages going through the hub and messages at the destination. Okay, so let's look at the source. About dash de dot html. The source tells me that this resource was created. It has a certain length and it has a certain content-based hash. Nice. We also have about dash fr, which is a resource also created. Different length, different hash. At the hub, we'll recognize our about dash de identity by our URI. We see that this change notification has been posted to one subscriber and if you look at the payload, you'll recognize the change notification. It's a URL set. It has a URL element. It has a lock with the URI about dash de dot html. It has a last mod. The time it was created as it turns out, change equals created, our attribute, and our content-based hash is included just like the length and the mime type. So that's the payload that's been sent. It's been created at the source, sent to the hub and further on to the destination. If you look at the payload for the French or for the dash fr document, it's also been created, surprise, surprise in different hash. Okay. So on the destination side, we identify the resource of interest, just for, as an example, the destination tells us that this resource was created by interpreting the payload, right? What it does is it goes fetches the resource from the source. It computes its own content-based hash and compares it to the content-based hash that the source has computed and sees, okay, this is a match, trustworthy for now, maybe. File size is the same, good. And looking at the payload there, we recognize the change notification that is received at the destination set. So let's look at the source. Whether those resources were in fact created, and indeed they are. You look at the German version of that. Now we're talking über-resourcing and the web is still highly dynamic. And let's look at the French one as well. And even in French, the web is highly dynamic. Okay. So it's a little translation service that we have running just for a sake of demonstrating what can be done. Okay, let's look at the destination, whether those resources have indeed been transferred. We do a reload on this directory listing, and voila, the resources are there. So at this stage right now, source and destination are in sync. Okay. And we see our German version of the document again. All right, nice. Let's reverse this and let's start the process of deleting those newly created resources. So ideally, as I mentioned, after this process only the English version would be left over because it's our static document. So we hit the button, delete resources, and again, look at the listener here to see what messages flow through the system from source through hub to destination. That takes a little bit. This listener, by the way, runs locally, ran locally on my machine. All right, so let's look at the source. We see the Russian version of the document and the source tells us that this resource was deleted. And even the French one was deleted, and sadly enough, the German one was deleted also. The web is still down there. English. It's not impossible, right? So looking at the hub is the identifier of a German document, a German URI rather. And looking at the payload now, this is a different change notification simply because the change type has changed, right? It's changed deleted, triggered by pressing the button there. And hence, it doesn't make any sense to include an MD5 hash there, for example, because it's a deleted resource. Looking at the destination's end, destination interprets the payload. So, okay, the resource was deleted. I will do the same on my end in order to be synchronized. And looking at the source, indeed, we've seen this before, the resources have all been deleted. I do a reload on the destination's end and we'll see there also the resources have been deleted. So now again, the source and destination are in sync. And so the point, the main point, I think, of this demo is that it nicely shows the difference between push and pull. So this is a notification service. This is upon change on the source's end, the source generates the change notification and pushes it out in more or less real time. The destination is able to react to that change notification and do the appropriate operations in its file system, for example, in order to be and to remain in sync. So this really aims at low latency and between source and destination, I'm sorry. Unlike for a pull-based system where the destination would basically do a guessing game of when to ask for a change list, when to ask for a resource list in order to learn about changes that have occurred in a certain interval. All right, so that concludes our little demo. And let me just conclude really quick with some remarks in terms of where we are at with our specification and with the resource and framework. So fairly far is the short answer. So the resource and core specification is right now undergoing the nice avoiding pull. And assuming that this is successful, we're looking at a nice specification maybe in July of this year. So the spec is likely not to change dramatically anymore, so it's fairly solid. However, we're also maintaining a copy of the specification at the OEI website. I'll show you a pointer later on so you can always take a look there as well and see what's going on. Besides the core specification, we have two other specifications, one of which describes what I just showed you in the demo, the notification part, which we isolated a little bit from the core specification. That's in beta right now, simply because we feel like there's more tests going on with the Popsup Hubbub as a protocol. We will, however, release very soon the Python-based software for all three components, the source, the hub, and the destination. So that, again, goes in the same direction that we're aiming at a low barrier of adoption. If you feel like this is something that you could use, feel free to use our first step there, basically, in terms of source code, right? And on top of that, if you feel like, oh, this is nice, but I really don't want to be burdened with implementing entire hub, we're helping you out there as well in terms of we're about to provide a hub as a service that you could use to play with, to test the framework, to test it in your environment with our hub, basically. That's clearly not a long-term solution, but maybe it helps you take the first step. The second spec that we isolated from the core spec is an archive spec that also isn't better without going into detail there, but if you, as a source, feel like you need to keep a record of all your resource lists and all your change lists, then the archive spec is for you. If you are going, what is he talking about? Never mind. A couple of pointers. I mentioned that on our open archives, but we do keep a copy of the specification, so feel free to go there, openarchives.org slash rs for resource sync. There you'll find a table of content pointing to the core specification, to the notification specification, and to the archives specification. You have Google groups where we invite everyone to provide feedback, leave comments, tell us how we're doing. Please let us know if you are interested in adopting and playing with it and trying. We're more than happy to help and fill the pointers. And of course, some references we did build on sitemaps, we mentioned that, and popso popup in its early stages, hosted there. So we're a little bit in academia, so we did some publishing on this, based on this. The first one in particular, last year's JCDL, a short paper that described the fun we had with exploring sitemaps and how search engines used sitemaps and how they would respond to our enhancements of those sitemaps. And we get a couple of dealer papers as well to describe our philosophy of resource sync and the technical background there as well. I invite you to go check those out. This, of course, is not just Herbert and me doing this. This is a team effort. Herbert mentioned it. Plenty of contributors from NYSO, just so netty walking in without this kind of support, this kind of effort is never possible. I mentioned Harish as the main developer for the demonstration. So thanks to those. And other contributors, Simeon was mentioned. Bernard by now, back in Vienna. Michael from Old Dominion and Carl Ligosi by now in Michigan, all key contributors to this effort. So this concludes our presentation for today. Thanks a lot for your attention. Herbert and I would be happy to answer your questions and we'll be here today and tomorrow for sure. So if there's another question that does not come to your mind right now, but maybe tomorrow, just approach us. We'll be happy to talk to you and answer your questions. Thank you. Thank you.