 So, we welcome Jonas Uberg with this talk, Attribution Revolution. Give him a big round of applause. Yeah. Thank you very much. Thank you all for breathing this early morning and coming to attend this talk as well. I'm going to talk about the attribution revolution and why I think that we have a possibility here of turning copyright upside down or inside out if you want. Just a quick show of hands just to give me an idea. How many of you have heard me talk before? Not so many. Excellent. Then you're going to be surprised by this one. I'm going to show you an image and I would like to have a quick show of hands as well to see how many of you in this room can recognize where that image is from. If you can identify the author of it or perhaps identify the series which is from. Show of hands. Where is this from? Okay. Fairly good. I had a talk last week in London at the Open Document Foundations meeting. And when you're talking to people who are used to writing word processors, you can imagine that this joke went off quite well with them. Okay. Most of you recognize this. This is XKCD. It's drawn by Randall Monroe. Randall has a peculiar style of drawing. So it's quite easy to recognize it whenever you see it. It also has a peculiar sense of humor that attracts many of us. Now, let me show you another one, though. And I'll ask you the same question. Quick show of hands afterwards to see if you recognize where this image is from. Okay. One. Oh, okay. Two. Lena, you've seen this before. Okay. Two people recognize this one. So the rest of you might be surprised when you learn that this image is, in fact, also by Randall Monroe. It is also part of the XKCD universe, it's comic number seven in that series. Now, why I'm showing you this is because knowing that this image is part of XKCD, it probably changes the way that we relate to it. It changes the way we feel about this. Before we knew that this was drawn by Randall Monroe, this was, yes, an anonymous sketch that I could have taken from my own sketchbook or found anywhere on the internet. But once we learn that this is by Randall Monroe, it's part of the XKCD universe, then suddenly we have the context that makes this image valuable to us. It gives it some meaning. And I can almost guarantee you that if we tried to sell this one, if Randall would sell the original of this, he would obviously get a lot more money as well if that knowledge was conveyed to the potential buyer. So knowing where things come from, knowing who created something, who's the author, when it was created, where it was created. Now, all of those things are quite relevant and we see them all around us. We see them in Wikipedia, citation needed. We see them in science, right? Obviously, everything that we've done for as long as we can remember in terms of scientific advances builds upon what people have done before. And we're used to crediting those people by attributing them when we write our papers or journals. In politics, well, you can claim that in politics it's maybe not too common to attribute to some known source that people can actually check. But as a politician, you do this all the time. Anyway, you attribute your statement to somewhere. You attribute the facts from somewhere. In culture, in art, we have this. We attribute, we build upon something from before. In food, well, okay, I admit this is a bit of a stretch. But if you take up a food carton, one of the first things I do is I usually just flip it around and I look at the list of ingredients. Now, that's attribution in a way. It tells you what does this actually contain? Where does it come from? What made into making this product? And this is the provenance of our work. This is the history of something. This is where something has been before, where it was created, who created it, when it was created, for what purpose was it created, and then what has happened with it until we see it today. If you walk into a gallery and you look at the paintings on the walls, you're most likely gonna be interested in the information about those paintings as well. You're not just gonna look at the paintings, you're gonna look at the provenance of those paintings. You're gonna look at who actually painted them, when they were painted, perhaps why they were painted. The title of it can give you some information, it can give you some knowledge. It gives the paintings some meaning. Now, provenance is also connected to the aspect of reputation. Reputation obviously be, you know, something that we all have around us today as well. If you look at LinkedIn, Facebook, Twitter, everything is about building our reputation. If you go on GitHub, you're talking about reputation. Every contribution that you make contributes to your reputation, your standing in society. And that's facilitated by the attribution. It's facilitated by people knowing what you have actually created. It's knowing what you have contributed to society. Now, let me ask you one more thing. So quick show of hands as well, see who's the avid reader in this room. If I say woofy, show of hands, how many knows what a woofy is? Okay, I'm gonna send you to the library straight after this. A woofy is a reputation-based currency that was first envisioned in down and out in the Magic Kingdom by Corey Doctro. Now, in this story, down and out in the Magic Kingdom, Corey Doctro hypothesizes about a potential future in which the currency that we have today is replaced by reputation. What you do and what you create contributes to your woofy, which you can then exchange for other things in turn. Now, when Corey Doctro wrote this, this all obviously seemed quite a lot like science fiction. And it's written as science fiction, is Corey Doctro after all. I would argue, however, that it's actually not science fiction. We actually have a reputation-based currency today as well. Maybe not exactly in the larger scale that Corey Doctro envisioned, but we do have it nonetheless. And this morning, I was reminded of one example of it, and I took the liberty of just slotting that in here. How many know about Advogato? Okay, not so many as well, so I'm gonna introduce that to you as well. Okay, it's Advogato, Advogato.org. Advogato is one of the earliest, very earliest, attempts at creating essentially a social network. It pioneered the concepts of blogging on the internet, sharing your experience with other people, and it developed a system of trust. It took the web of trust and implemented it in its own system so you could certify other people according to their experience, in this case, within the free software community. Now, Advogato was founded in 1999, so that's quite a while ago, right? And it's been quite frequently cited since because it really was one of the first to try to do anything similar to this. Now, as you can see, I was an Advogato already in 1999. Now, that gives you a hint about how old I am, but it also gives you an understanding of how long I've been working with this. Now, is that important? Well, to some extent, you know, it's not really, but if I look at other people that are on Advogato, which I did this morning because I was curious, you got Bruce Perence joined in early 2000, Richard Stallman joined in mid 2000, Bradley Kuhn joined in early 2001, and then you got me joining in 1999. And of course, that's then part of the story. I was before everyone else, right? Do I feel proud about that? Well, you know, I'm human, so of course I feel proud about that. I was at Advogato before all these other big shots. Does it mean anything? Not really, but it's part of the reputation mechanism. And I'll introduce to you another project as well, which came to my attention still fairly recently. So it'll again see how many of you know about this project. It's P2P Value. How many recognize P2P Value? Okay, one person. Chris is not here, is he here? Okay. P2P Value is an European Union-funded project, so it means that it's a huge research portion within this project. But what I find interesting when I look at this project and I look at what they're promising to deliver, or at least what their objectives are, I highlighted two things here that's coming up. They want to deploy a federated platform in which real world communities will interact, participate, and collaboratively create content. So this is common space peer production, it's called. And they want to develop a set of value metrics and reward mechanisms that incentivize the participation of citizens in so-called common space peer production. Now to me, that's a reputation-based economy. That's the budding stages of taking what we have on LinkedIn, on Twitter, everywhere else where we're talking about reputation, and trying to put it in a larger context, trying to create some platform that can actually facilitate this, not only in the reputation because I'm publishing something, but in the terms of reputation because I'm creating something. When I started thinking about attribution or I started thinking about the attribution revolution, I started talking to people, I started talking to photographers primarily because I saw them obviously caring quite a bit about being attributed for the photographs that they take. And if you look around at newspapers, you see pretty much all the photographs are attributed to get images, AFP, or some other agency. And you might even have the name of the photographer there. Now, something that I realized when talking to people was that everyone seems to agree that attribution is important. And when I was talking to some friends of mine who are photographers, they keep telling me that, you know, I know the direction in which the world is turning. I see the way the people are taking my photographs, they're sharing them online, they're publishing them on Twitter, on Facebook. And I'm okay with that because I know that I can't actually change the course of history and I can't change the way the people behave. But if we can assure that whenever my photographs get published, I at least get attributed, that would solve a lot of concerns that people have. Unfortunately, we're rather bad when it comes to actually giving credit, where credit is due, to actually attribute photographs when we do use them. Now, Creative Commons licensing, as one example, they stipulate that whenever you reuse a work, you must attribute it in a manner reasonable for the medium where you're publishing it. Still, we see a large part of the Commons which is not attributed when it gets shared around. So two years ago, I started working on a project called Commons Machinery, which is an organization that aims to make attribution information and metadata about creative works visible and actionable. Now, visible means that we should actually be able to see the metadata that are connected to the works that we're sharing. Unfortunately, this is not always true and it causes a bunch of issues along the way. In the early 2000s, when at least the Swedish governments, and I'm sure other governments as well, started publishing court proceedings and similar documents online, usually they did it as Word documents, and when they tried to hide someone's name, they would just take the marker in Microsoft Word and just strike something out with black, and obviously people figured out that you can just open this and just click control set a few times and undo that, and then you got the name, right? If you put it as PDF, it might look okay, but if you look underneath everything, you still have the text and you still have a block of black above it and you just need to separate the two and then you have the name, which means that today, people are so afraid of publishing anything deeply that most of the time they print something, physically mark it and then scan it again, which is obviously ridiculous because you lose a lot of information in doing that, but they do that for one particular reason, because they don't know, it's not obvious to them what information is conveyed when they're publishing something, when they're sending their files around, they cannot trust that what they see on the screen is the only thing that is available there. Even if they take pains taking effort to remove all the names completely, clear away the history of the document, it's very easy to just leave out the fact that you can just go to file properties and you maybe have some names in the title or in the description of that document. So with Commerce Machinery, we wanted to make the metadata visible so that people actually are aware of the metadata that gets passed around. They are aware of the information that their documents and files contain. Obviously, there's a privacy issue in that as well to make that visible, but then the other part is to make it actionable and by actionable it means that we need to have a way to actually develop our software so that it can act upon that metadata to give us helpful advice and to give us helpful information about the works that we're using to allow, as an example, a word processor when you're inserting an image to automatically tell you that this image is from this particular author, would you like me to put an automatic attribution to that author in there? That would be helpful, but we can only do that if we have actionable metadata. So fortunately, we were funded by the Shettleworth Foundation for a period of two years, which is now just coming too close because they were interested, as I was, to see what would happen if we practically start putting our ideas in practice? What would happen if we start implementing systems that supported retention of metadata for digital works? Where would that lead us and where would the problems occur along the way? And we've learned a lot since we started working on this, so for the remainder of this presentation, I'm gonna take one small step back and I'm gonna talk a little bit about the retrospective to talk about where we came from and what we did in the process up to now. I'm gonna mention where we are now, what we need to do next, and then I'm gonna come back at the end to talk about what does that actually mean for copyright? Because you remember that was part of the title, turning copyright upside down, and I hope that I'll live up to that promise. This is an image from one of the first white papers we produced. Now, this shows you the different standards that are available to convey information about works. So these are all metadata standards, different levels, right? It can be difficult to read from the back, but you've got EXIF standard, which is a metadata representation, but it also is information about the work itself, like the author and the license. So that fits in there. IPTC is a similar standard like EXIF, but it's created by the International Press and Telecommunications Council, specifically for images. We got XMP, which is coming out of Adobe as well, Dublin Core, ODRL, which despised the licenses, Proven, which is W3C provenance standard, and all these other standards. There's actually a bunch more. You'd be surprised what you find when you actually start looking at this. It seems that everyone who's been thinking about this at any point in the past has decided that whatever standards are available are not suitable for them. So we figured out quite quickly that there are simply too many standards. There's no way that we can make this work if we have even 5% of the works using EXIF standard to describe themselves, 5% using IPTC, 5% using Pro, 5% using some meals. It's gonna be nightmare to actually try to implement that. And each of them don't really see enough use either. Even EXIF, which is probably the most use standard for images to convey information, it doesn't really have enough use. It has no tool support to a very large extent. Load something into a photo editor, change it around and then save it. And that information is very often just lost because the tools don't actually support retention of metadata or passing it along. There's an embedded metadata manifesto that came out of the International Press and Telecommunication Councils. They did a study of social media platforms. And the study was fairly easy. I mean, you just took an image with EXIF and IPTC metadata embedded within that image, they uploaded it to social media platform and then downloaded it again and they saw what happened to the metadata. And lo and behold, in almost 80% of the cases, the metadata was just lost. Slicker, 500px, Twitter, Facebook, probably one of the worst of them. They just ignored the metadata, they just took it away. Google was one of the better ones. They actually took some effort to retain at least EXIF and IPTC information, but some of the other information was lost as well. So retaining metadata by hoping that whatever you embed within the file will get retained is not gonna happen. It's a panacea at best. So we started thinking about what other ways are people using creative works? And we came up with the case of copy paste, which is the very simple procedure of someone finding an image online that they like, clicking copy on it and then going to, like a presentation editor and clicking paste. And we started thinking about how can we make the metadata of that image, the information that we need in order to attribute accurately be carried over in that operation? And now I'm gonna slide into slightly technical little sort of how we did that, but I hope you'll follow along anyway. It's not too technical indeed. So the first thing we did was we simply split the clipboard into essentially. On the clipboard by default, if you copy an image, you might place an image JPEG resource available for the recipient application and it just looks and say, oh, here's a JPEG image. So I'll grab that one. What we did when someone clicked copy was that we put not only the JPEG image on the clipboard, but we also put an RDF fragment containing the metadata, so machine readable metadata. And then it would be up to the receiving application when someone clicked paste to say, I can get either the image itself, I can get the metadata or I'll get both. So that was our first attempt. Later we changed that completely again. We realized there's a bunch of issues for this. I'll get back to them. And in our more recent prototypes that we've built, we instead of putting the image on the clipboard, we actually put an HTML fragment on the clipboard, which has RDF metadata embedded within it. So you can see, for instance, the title here and the license together with the image and the source of it. Now, we implemented this, our variations of this and quite a few tools in GTK, GIMP, Inkscape, LibreOffice, Low High Editor, Media Goblin, and I'm quite proud actually, I'm quite happy that we were able to bring the sort of copy paste scenario to a close to the point where we could find an image online, we could find it on Flickr, we'd click copy of it, we'd get it into LibreOffice, we'd click paste and we'll insert the image together with the attribution and then if you could copy again from there and get it into a web-based Low High Editor and click paste, the attribution carries with it as well. Now, the problem we had when implementing all of this is that in most case, whenever you're talking about copy paste as an operation, it involves changing the core of the applications. It's not possible to do this in the general case with just an addition or an extension to a program. You need to actually change the core or alternatively, you need to implement your own copy and paste functions. But that obviously very quickly gets messy. You also have a UI visibility issue in that most applications don't really show the metadata, they don't care about it, so they just hide it away. And as well, we realized that there are significant clipboard differences. What worked for us on an X-based Linux system did not work on Windows and did not work on Mac OS X. So we were kind of stuck in that path. So that's why we went back and we did the HTML approach as well because that works on all platforms. So we managed to do the copy paste, we even got to the point where we got LibreOffice presenter to accept images that were pasted into it and you could paste as many as you wanted, you could move them around, you could remove them or add new images. And then at the end of the presentation, you would have to insert new slide and then you'd click Insert Credits and it will give you a list of all the images that you used in your presentation. All of that code is up on our GitHub, so please feel free to check that out if you want. Now doing this, however, is a very massive effort because it involves changing every single application that we use. And that's quite a few. And it becomes very application-specific. Whenever you want to do something for LibreOffice, it was different than doing it from a low-high editor. Even if you can abstract parts of it away and make use of some common libraries, it was still quite a substantial effort to actually get this working at all. So we started thinking, what would be the unique way of doing this? So if the problem is to retain and manage metadata, why don't we solve that particular problem? Let us not solve the issue of making this work in an application, but let's solve the simple problem of retaining and managing metadata. So we started working on what became known as Elogio. Elogio is a distributed catalog of creative works. That's a glorified term. What we're really talking about, honestly, is a metadata database. It's a database that is specifically crafted to hold information about creative works. And it can look like this. You'll get the identifier of the work, which in this case points to our catalog and the identifier of it. And you will get a JSON structure in this case back, which gives you a locator, for instance, saying that this image is, well, in this case, Alexanderplatz in Berlin. It has a block cache, which I'll show in a little bit, and it has a particular license. In this case, it's not a license itself. It's just a public domain mark. Now, Elogio uses W3C media annotations as its way of recording information about works, which, you know, it's a fair enough metadata standard that most other metadata standards can be mapped into, like XF and IPTC. And provides an API. So for any image that is part of this catalog, you can easily look it up using the URL of that image, or if it's an image indeed, then the block cache of that image. And I'll explain the block cache in a while. And the way it works is that you have a work record, which explains the image itself, gives you the author, gives you the license, and then you have multiple media records. Because we realize quite quickly on as well that if someone takes an image, post it to their own website, it will most likely get a different URL. So it will be the same work, but it will have a different media. It might have a different resolution. It might have been changed in some way. So you can have multiple medias connected to each creative work. And we've seeded the database with 22 million images from Wikimedia Commons. So essentially, for any image that is part of Wikimedia Commons, you can look that up in our database, and we'll get you back to metadata. We also developed two browser plugins, one for Chrome, one for Firefox, that can interact with this API. And you're asking yourself, so what does it really do? Well, this is one of the things it does. If you're out browsing the web, if you've got a Logio plugin installed in Chrome or Firefox, if you see an image that you find interesting, you can open the Logio sidebar and you identify the image and you just click query. And if that image is part, in this case, of a Logio, meaning by extension that is part of Wikimedia Commons at the moment, then it will get you the information about that image. It will show you the title of it, who authored it, and give you the appropriate license for it, if that is known. It will even green mark licenses that are free cultural licenses because you love them. Now, it also offers you the opportunity to copy this image as an HTML fragment, and you can just take that image and paste it into LibreOffice as an example, and it will copy over not only the image, but also the attribution. And that works straight off without anything except the browser plugin, which is nice. Now, let's take a look at this. Now, what's the catch of this? Well, there's obviously a catch to this, which is that identifying an image that has been resized as an example, how we can do that depends heavily on the algorithm that we used to do that matching. And for a Logio, we wanted to have an algorithm that was very lightweight, that didn't take a lot of resources, that could be calculated quickly within the browser, and that would generate some kind of value for an image that would not change even if you resized the image. And ideally, it should generate as few as possible false positives or false negatives. So the way that our algorithm works, now, run you through the algorithm as quickly. So did you see it? Before I talk about where it does not work. So this is Alexanderplatz. This is Alexanderplatz in 1700s in Berlin. And you'll see that I've taken this image and we've split it into 16 by 16 cells. So it's a matrix here. 16 by 16 happens to be 256. So that's the number of bits that our hashes actually generate. What we do with this image after we segmented it into this way, into this matrix is that for each cell, we calculate the sum of all the pixels within that cell. And we do that for all of the cells. So we'll get something looking like this. So a bunch of numbers across the board. We calculate the median of all those numbers. And then we go through each cell in turn and we see is the value within that cell above or below the median? And then we assign either a zero or a one to that one. So then we get a hash looking like that. And then we just wrap that up and pack it as a hexadecimal number and that's our hash. So it's very simple. It's very efficient. Takes almost no time to compute. And you end up with hashes looking like this. So the first one is a hash that I made of Alexander plots in a 640 times 326 pixel resolution. And the second one is the same image, but rescaled to 200 by 102 pixels. That's about one third the size. And you'll see that they do indeed look similar. They're not identical because obviously some things might change when you're rescaling the JPEG. They're not identical, but they don't differ that much either. They differ if you expand this into the bit field that we have. They differ in six positions. So six bits are the difference between this larger size and the smaller size when we apply the block hash algorithm. And we've come from experience to say that if something is six bits or lower or 10 bits or lower, then we can be fairly confident that we're talking about the same image even if it has been resized. Unfortunately, however, reality comes and bites you in the ass. So this is my son. This is in Greece a few days ago. And it represents something that people love doing. They take pictures, pictures of kids. They take pictures of skylines. And all those pictures have as common denominator that they have a very bright upper half, usually wide or blue sky. And then they have a very dark contrasting lower part. And what happens if you have a very bright part on top and a very dark part at the bottom? What happens when you do the numbers is you end up with the upper half and the lower half being with very high numbers. And if you take the median, it will be somewhere in the middle of this. But when you then check if something is higher or lower than the median, you end up with a hash that essentially a bunch of zeroes followed by a bunch of ones. Because the contrast is so great between the upper half and the lower half that all the differences within those regions are simply lost. They're overpowered by this. So this was the original block hash algorithm, the way it worked when we just implemented it straight off from the research literature. We changed this algorithm and we changed it in a very easy way. We simply split it up, split this field up into four distinct horizontal blocks. And we do the median calculation not for the entire image, but for each block itself. Which means that even if the first block is only blue sky, there are just slight variations in it. And if we calculate the median on that and then do the bit calculation, then we'll get a lot more contrast. We've got a lot more details of it. So that's the way the block hash algorithm works right now. And it gives us hashes like that. So it gives us much more detail for essentially the same images. Now, we're still getting collisions. That's unavoidable. We're getting collisions in about 1% of cases. We got about 100,000 images from the internet. We ran our algorithm on them and we compared them to each other in a crosswise manner. So there's 1% collisions. Collision here means that two images or more images generate the same hash, identical hash. Now, however, in 84% of those collisions, we're talking about two to three images generating the same. So we figured that this is fairly okay. This is also 100,000 random images, which means that clipart maps and various other things, which maybe differ in very small details. They are also counted as a collision here. We also get a number of false positives, however. These are images that are recognized as being similar without actually being similar. Because the algorithm generates close matches for them. If we set the maximum distance and allow up to 10 bits variation between two images to classify them as the same, then we get about 1.8% false positives. We can get that down substantially by just lowering the distance that we allow. If we say three or five, we're down to less than 0.2%. So somewhere there we feel that we're doing quite well. What about derivative works? What about clipart? Well, in one word, forget it. Derivative work, meaning when you take an image you add a border to it, as an example. You take an image, you crop it in some way. Now, if you can imagine the algorithm, you'll quickly recognize that if you crop an image, it will generate a very different hash from it. So we set the bar and we set the limit for ourselves saying that we will do our best to match verbatim copies of a work. You can resize it as much as you want. You can change the format from a tiff to jpeg to a gif and back again if you want, and we'll do our best to match that. But if you make it derivative work, if you add a border or you try to change around an image in some way, then all bets are off. We're not going to guarantee you a match on that one. As well as clipart or any other diagrams or graphs where you have large areas of white or black or some other color and just a few lines. It will do a rather bad job at that as well, because again, you have these high contrast areas. But instead, we do get something that's blindingly fast and with very small hashes with few false positives. And this is all up on blockhash.io if you feel like implementing this yourself or have a look at what it does in practice. Now, unfortunately, for our case 22 million images is not nearly as much as we need to actually make this useful. Now, 22 million images may sound like a lot, but it's a very small fraction of what we actually need. Now, Creative Commons about a month ago released their sort of state of the year and saying that there's about 800 million images sorry, 800 million works out there which are openly licensed. That's the size of the comments at the moment. Now, not all of those are images, but a fair portion of them are. I'm estimating that there's probably about a half a billion images out there that are openly licensed. That should be part of Elogio, but which is not there today. Now, scaling up to half a billion images, it's doable in terms of database size, doesn't add as much as we would fear. So we can easily do that. However, we're talking about searching by a perceptual hash here. We're talking about searching for a hash value of an image where we allow up to 10 bits of difference. Now, if we said that we're not going to allow any difference, that would be a very easy search. We can do any kind of database can search for unique values. That's not a problem. But if you're searching for something that is similar to something else, that becomes a very different problem. So we found again some research to help us along our way. This algorithm, surprisingly not perhaps, comes from Google. It's called HM Search and it partitions the hashes in a way that you avoid doing a search of all 22 million. For any hash you throw at this algorithm, it will give you back maybe a few thousands of possible matches and then you just need to sift through those to figure out which are real matches and which are not. That's again also available on githubhmsearch.io. Now, where are we going from here? Well, the first thing we want to do beyond scaling to half a billion works, which we should do, is to flip the read-write bit. Because at the moment while the API has provisions that makes it possible or would make it possible for someone to edit information as well within the LODJO, we haven't actually enabled that yet. So far, we have just taken information from Wikimedia Commons and put that into the database as a sort of read-only repository. And we rely on people updating the information on Wikimedia Commons and then we get that information into a LODJO. But flipping that bit and making it read-write that's what's going to change things. We also need to extend a LODJO to support non-images. So any other kind of creative work, so again scales quite massively beyond half billion even. And we want to implement support for the API directly in applications. So again going back to the application side and figuring out, OK, now that we have solved retention and editing of metadata separately, how can we then make the link to the application? Now, how the heck does this relate to copyright as I promised from the beginning? Well, it's easy to think of LODJO as a copyright registry. And I promise you that it is not. A copyright registry is something that I personally detest. A copyright registry is an attempt by someone to provide an authority database and authority information about who owns certain creative works. A LODJO is not an authoritative source of information. A LODJO is not a copyright registry. A LODJO is built as a community curated repository, in this case the Wikipedia community to start with with an implicit agreement of respect. So this is something that we learn from Wikimedia as well, that there's a reason that people keep contributing information to Wikipedia. There's a reason why people take painstaking efforts to actually keep the metadata on Wikimedia Commons up to date and reliable. And that's because there's an implicit agreement that we actually want to respect the author. We want to respect the author enough to give accurate credit where credit is due. We don't want to lose that. Now, a LODJO in this way takes a slight sidestep away from an initiative like Creative Commons. Now, obviously Creative Commons is by the licenses themselves. Creative Commons was an attempt to work within existing copyright regime and show that given the situation at the time in 2001, we have copyright you want to share we can work within this system to give you the tools, the legal tools that enables you to do that. I believe that we are coming to the end of copyright as we know it right now. A quite recent phrase within the software community has been POS, Post-Open Source Society where the guiding light is essentially the phrase fuck licenses, put it on GitHub. And I think we're seeing the same in the creative sphere as well. Copyright is losing its importance day to day. And we're coming to a place in time where, you know, within five, ten years I'm very sure that the European Parliament and, you know, other parliaments around the world will take steps to make additional exceptions to copyright as it is today to allow even more private use as an example without hindering people in their day to day activities. So copyright is changing and Elogio is one of the tools that we need along the way. Because Elogio is post-copright licensing it doesn't really care about the license itself. It obviously implements support for it. The W3 media annotation standard it gives you the tools that you need if you want to record information about the license. The license is not terribly important. The important part here is who actually created it. It was the provenance of a particular work. It was the details about that work. Meaning that from Elogio's side we take very great care to respect the author but not the institution of copyright. Because just as, you know, the friends, photographers that I'm talking about before yesterday are saying as long as we make sure to attribute accurately we're good. So we can take control of the provenance of creative works by using tools like Elogio. It doesn't need to be Elogio but it can be tools like it. And we can show the world that we care about authors. We just don't care about copyright. But we care enough about authors to take as a collective effort control over that information to control the provenance and keep a record of where creative works come from what happens to them and make sure that we attribute the authors fairly. It's my firm belief that if we respect authors if we attribute the authors if we record their contributions and if we're honest about all this it will make it easier to contribute to the comments. It will be much easier for someone to say here is my image I'm going to upload it here do what you want with it just make sure that I get credit for it. If we respect attribute and record information about images that will help raise the value and meaning of digital works. Just as I showed you in the beginning just as knowing that the sketch I showed was by Randall Monroe changed the value and changed the meaning of that work for you and the value and meaning of other works as well. And if we do this as a community then copyright holders will eventually be devoid of their currently exclusive right to dictate because that's what you're doing with copyright registries. They're telling us that we are the owners or they are the owners of the culture that we have around us. With tools like Elogio we're coming together as a community and saying that we know who authored this and we will take care to recognize that you don't need to tell us we'll keep track of that ourselves. Thank you very much and thank you for listening. Thank you Jonas. So we have 15 more minutes for questions do we have any so microphone 4 and then we have one online as well. Hello Thank you very much for your talk. I'm very interested in this function but I don't see how this function in desktop applications is still useful. I cannot use a library office plug-in I need something in WordPress.com I need it in Flickr I need it in Facebook. Have you talked to these platforms? Yeah, okay you're right and that's what I hinted at the beginning as well that in order to make this truly useful we really need the tool support as well we need support for this in the applications that people use day to day and that was one of the reasons why we decided to change our approach to passing information on the clipboard by passing information on the clipboard as an HTML fragment we've actually shown that this works in LibreOffice it works in WordPress it works in Microsoft Office as well it works in a whole range of tools by default because most of the tools today can handle HTML right? Now that's not a whole story though because in order to actually make use of the metadata that gets passed along and to do something intelligible with it then again you do need application support so we started to have those discussions we started talking with the LibreOffice community as one example and they're catching up but unfortunately the awareness of metadata standards the awareness of what could be possible for the LibreOffice community today so it will take quite a long time until we actually make something sensible out of it Okay, so the next question from the online world Thank you, there's one question are there any plans to make images trackable after they have been cropped and have you looked into how YouTube does it because they seem to be very good in it? Yes, and Google is as well Yes, so we looked at a number of different ways of doing the calculation so that we could potentially detect images which have been cropped or changed in other ways as well unfortunately from our perspective most of those algorithms that are available are either kept secret or they're patented which means that implementing this in free and open source software is no go zone it will get better there are research underway to make this possible we're looking at changing the algorithms updating it according to what we learn but it's quite far from having something that could detect a derivative work as well Okay, thank you, microphone number two Hello thank you for your work on the front-end side and the workflow side of things I have a question about you talked about distributed database and you talked about community curated direction how the focus seems to be on very specific projects and very things so what could be scalable things to work on to get more sources involved to curate those kinds of things and would you be open to other contributors like I don't know libraries, archives, European projects whatever you can think of so what would be the long game in kind of distributing and community creating? So there's two communities or two groups of repositories that we're talking to to get their information as well within Elogio one is Europeana which obviously catches a lot of the Galleries, Libraries, Archives and Museums around Europe that would be one, the other is Safe Creative which is in fact part copyright registry to get their information within Elogio as well but still at that point we're only talking about specific collections, we're talking about read-only information so the logical next step is indeed to flip the read-write bit to make it editable but to think through and we're honestly not quite sure how that would look like because how do you deal with potential conflicts when people keep editing the same information so we'll need to go again to see what Wikipedia is doing in this see what policies they have in place and how that works and see if we can replicate that on our side and in terms of scaling beyond this distributing this, we made very sure from the beginning that the identifier that we have for individual works within the Elogio catalog is a URL which means that anyone can essentially set up a catalog and have their own URL scheme for that catalog as long as they don't change the API if you have the URL it doesn't matter in which catalog you actually look up the information, you'll get it anyway microphone number four Yeah, there's one comment on the URL that's we probably want something which can survive like 100-200 years and whether that's going to be solved by using URL as we have it today might create some problems I was also interested about this I don't know if all this can be applied to not just pictures but books, music, whatever and then there was one technical comment all hash functions should be one pass because if you go otherwise you will go to DRAM twice so that was the technical part of the comment Okay let's go over those three things Okay, so on the hash bit, yes we work quite a lot with the specific algorithm during this and now that we feel that we've settled on the way it works best in our environment we've documented this as well as an RFC which is submitted to reserve the namespace for it and then it has a very specific definition which makes sure that if you want to call something a block hash it's a very technical specification now okay, I'm sorry could you go back to the first question the URL yes, the URL, thank you yes, I skip over a very important piece when I say that everything will be solved by URL because as I said in the talk as well from the beginning we know that any kind of metadata gets stripped very easily from a work and even if we say that we should see URL, all we need is an identifier we should see URL that's going to be stripped as well so that's not the final solution to anything we need to work on different approaches of identifying works the only thing I was saying there is that at least with the URL we can make sure that this could potentially be distributed across different catalogs not just be one single monolith okay and the second point was probably the survivability if I remember correctly yes, other works, thank you applying to other non-immediate one of the reasons why there are so many metadata standards is that there are so many different kinds of works what is relevant for images in terms of metadata is not relevant for classical music as an example what is relevant for classical music in terms of authors and who plays what instruments and what instruments are available is not relevant for pop music so that's the reason why all these standards have come up are at least one of the reasons we believe that using W3C media annotations allow us to cover different sorts of works but it really needs to be thought through a bit more before we start working with it actively what information is actually important to convey about different kinds of works and what do the metadata centers look like for those kind of works so that's a larger piece of work so there's one question from the internet yes, thank you why don't you interwork with data from other sources like Flickr that also provides good metadata contains a lot of free work and info about the author we are in fact we don't have it in place yet so we're going to Flickr we have a communication going to figure out how we can get that information how we can integrate it in our system depending on how things go I'm quite confident though that we'll be able to integrate that and to make it available through the same API unfortunately for us Flickr is a huge resource it's more than 300 million images which means that even if we took you know, took a year to do that we're still talking about incorporating about a million works per day at the moment we can do about 6 million works per day adding to our database which is a fairly large number we can probably scale a little bit beyond that but still we're talking about a number of months works when we actually start working on that so any more question there are additional one from the internet if you have questions I'll be available up here for a little bit more after the talk we also have some information about Commas Machina and the work that we've done which will be available down here from Lena and I will have it up here as well so feel free to grab it okay, thank you Jonas thank you very much