 All right, everyone, this is Basil Simon for Hackers Trying to Stay Relevant. He works at the BBC News Lab. Give it your full attention. Let's see if this works properly. So yeah, journalism is a waste of knowledge, which has been itself sort of ironic since one of the most important functions of journalism is to create and enhance the collective knowledge we have and to inform people about, you know, what's going on and why it matters. So whether journalism has managed to do that, to actually inform people, or not is another question. But we're sure what I would argue is that the journalistic process in itself is spectacularly failing at making use of all the knowledge it creates. We write articles and we have to rewrite the whole thing every time something new happens. And we waste a lot of resources doing that. We publish hundreds of pieces online without any information about them, just hoping that people would read them and we sort of call that online journalism. We make very little use of the technology we've got at our disposal and I'd like to walk you through sort of 20 minutes how and why we might want to change how we do the active journalism. So yes, it's actually working. So my name is Basil. I work at BBC News Labs. Got a journalistic background and sort of ended up doing computer things along the way, and that was sort of two years ago. In News Labs, our role is to spare-headed drive innovation in news and that's for BBC itself as per the wide industry. So we take the latest tech and we have a look at it through a lot of prototyping, trying to sort of fail a giant methodology. And if it fails, it fails. We can learn from it. If it succeeds or if there's some good stuff in there, we usually ping a note to our good friends in the production services to try to see if they even want to have a look at transferring that. So in the beginning, there was an idea. People care about things and you would say, well, that's relatively obvious. Of course, people care about stuff. People like stuff. But I meant slash things. People care about concepts, the information itself. Not the documents that contain the information. Nobody cares about a web page. Not the servers that contain the documents. Nobody knows where the server is. And certainly not the IP addresses that actually point to the servers containing the documents. People do care about the information, what they're reading, when they encounter, and they're actually doing a good job of interacting with that information. They're much better than machines. For example, if you were to take, if you were to feed the entire Game of Thrones saga to a computer, it will very easily be able to tell you the full title of the nearest Targaryen, which I can never remember. Or how many times somebody said that I'm a golless. But it will be completely unable to pick up on that obscure reference that Joe Cure's friend just made. Our computer is not reminded of Great Britain when we talk about Westeros. To our computer, Aerys Targaryen, the Mad King, because he likes to set some people on fire, doesn't look at all like the blog on the right, Child of the Six. In fact, our computer doesn't even know what a Child of the Six is. Whereas you probably guess you must have been a man and you must have been a king somewhere. And you'd be absolutely right. That's actually our ability to extract meanings from things we just encountered the first time is one of the main differences between us and machines. We are able to extract meaning from things we see and we read, and machines are unable to do that. They can't take the guess. They have to rely on careful teaching and careful learning. Note that it's called machine learning at the moment. They're known from us and their rely on probabilities sort of simulates that guessing. So my colleague Paul Risen, who's an information architect at BBC, said that we construct a web of meanings in our head. And he says a web because of its interlink nature comprised of references and properties for which we navigate when we think about stuff. And the size of that mental graph, our ability to use it, can be called knowledge. And knowledge, as we know, is not a collection of isolated facts and a random collection of stuff you just know. Knowledge is connections and association. When you read that piece on BBC News, you were able to extract meaning to deduce that David Cameron, the straining David Cameron, means the Prime Minister of the United Kingdom and not something else. And you can connect this information with others you've got in your head. You know that David Cameron is the Prime Minister, the leader of the Conservative Party, an organisation that happened to have in the past for other leader, let's say, Margaret Thatcher. And there are other members of that organisation, the Conservative Party, that can be Theresa May from the Home Office or other people. You associate stuff like that. And journalism is precisely about that knowledge. It is about these associations and links. And the whole point of writing an article about an event through a person is to convey knowledge, to provide context and present some insights you have gained from that knowledge. And we journalists in particular pride ourselves in our unbalanced knowledge we possess about things, and that we're much more valuable than just opinions. Now I'd like to talk a little bit about link data because recently some clever people have been doing really good stuff starting with this idea of webs and connections between things. As early as 2012, it shaped how we delivered the London Olympics. We had one page for sport, one page for athletes, and this is Michael Thurbs' page. We've got, as you can see on the left-hand side, basic biographical information, as well as a list and status of the competition he entered and won, most of them, a dynamically generated news timeline and a medal box with the competitions he won. In 2015, another example for the UK general election, that's parliamentary election, we had one page for each January of the 650 constituencies automatically aggregating content about this constituency, and that was for our local audiences. We had a dedicated page to go to. All you had to do as a journalist was to tag your content properly and we were taking care of the relationship between the concepts. Another example is the juicer. It's a news aggregation and contact extraction API from us, BBC News Labs, and essentially it takes news from BBC and roughly 600 other news sources around the world, automatically passes them, and based on their content, tags them with related DBPD entities. The entities are, of course, based in groups in four categories, people, places, organizations, and things which are intangible. Thanks to juicer, we see how much and in which terms media sources from around the world talk about content topics. We've looked at what concept was associated with some of the most well-known of the candidates standing in the, that was European election in 2014. This is Nigel Farage, famous right winger in Britain. This is another prototype based on juicer, it's called what, who's talking about what? And that shows the concept most often associated with Berlin according to different media sources. So, for example, that the Guardian in France 24 mentioned Berlin with the concept of refugees and Angela Merkel, while the Daily Mail are beloved right-wing tabloid associates with Berlin for some reason. Another example, we get the quite amazing slash programs that was built a while ago, and it's a massive repository that gives a unique URL and URI to every program the BBC has ever broadcast, and there's a lot of them. You can see the URL of that, all the genre as well as the formats also have a different URL. And from there you can navigate to others, you can have your sort of onward journeys made possible by the fact that all these concepts and works are linked together. Now this is all very jolly, but I'd like to get back to my original point, which was the waste of knowledge. The basic unit of journalism remains the article, and I'm going to leave aside print journalism and broadcast journalism because there's not much we can do about them here. Even in 2016, online with all the live blogs, Snapchat, explainers, interactives, we do still write articles. And by articles I mean the massive walls of text that range from 300 to a couple of thousand words, but we just put online with some basic metadata, time and date, sometimes the name of the author, and if you're lucky, a category name as vague as world health or entertainment. We put them online and we hope that somehow people would read them and find them. And we trust that the article is the best way to contain and deliver the knowledge we produce. But are they? I'm not sure really because there's a big issue with the article, it's the cost of producing them. There's a lot of research framing, writing, editing, rewriting, and then publishing. And that takes a long time, some degree of expertise about the topic sometimes. The problem is stories develop, they evolve. They in fact compose a series of events and oftentimes news organisations follow up on the story. These developments are followed by new articles and you see that the costs multiply and add up as much as the story develops. Because each of these articles is going to follow the same inverted pyramid structure, the most recent and newsworthy bit at the top, details, context, and background information further down. And this background context is in fact repeated if only slightly differently from article to article. And you probably see what I'm on the back to here. DIY, Don't Repeat Yourself, fundamental programming principle that's brought to us by Hunt and Thomas' Bible Pragmatic Programmer in the Central Reading. And when I see how much we repeat ourselves in news just by in fact the writing article that cringe a little bit. Because the costs multiply and add up every time you repeat yourself. And this is not even beneficial to our cherished audiences. Because the accumulation of articles and pieces about the topic or about the story only offers to a new reader the contemplation of a completely unstructured chaos. A chaos in revert chronological order where lifeblocks, columns, video, social media all mix together. So where to start? Ask the reader. How can I inform myself and make sense of that story I just heard about? And if the reader has that, well, we've failed at doing our job, we've failed at informing the public. And this is all because every time we publish something we let all the knowledge we've invested in that piece's production go to waste. So what if we started serving a bit of knowledge? We've seen that linked data has proven to be quite an amazing tool and we've seen that very closely. Now the point I want to make now is that we could and probably should go a bit further. We could produce knowledge in facts only once and make them reusable next time. We could take events, for example, the basic units of reporting in such a way. Because journalism is, after all, reporting about events. Events implicate what we need to do that's about people, places, organizations. And if you think about it, it can be clear in a way that a piece of journalism or a story contains a mini graph in itself because a piece is about something. It mentions people and company and places and see that we've got a web. We've got a little graph here. The events we're going to describe can be big events or small events. Some ontologies are going to nest events under each other as if some event could contain or rather be composed of smaller units. Take an election, for example, to visualize it. An election is an event in itself that's going to be treated as one massive big story. But the election is composed of many small events that make up the campaign, a candidate saying something on the radio, press conferences, the results coming in. A candidate turning the factory somewhere during their campaign is an event that humans would see as clearly connected to the election. But that event is also a story for the factory and for the local constituency or for the mayor who's there. So we've been playing at the BBC with a concept of storylines for a little bit. Storyline is supported by an ontology, like this one. This is by Creative Commons that was done by BBC with press association and the Guardian. And an ontology is no more and no less than a way to present an event-based narrative. Events and information are imprinted by the journalist in a structured way and can be ordered in a narrative by other journalists or by the same people. Some events can be part of different stories that's written in events and can be part of narratives. They can also have several interpretations. For example, a candidate's visit to the factory is according to this ontology an event in fact at the top the election itself and how successful the candidate is being to be called the story. The story is composed of this little event. The reporter's task is then to input some facts and events into the database as well as to connect them and explain to the machine the relationships events have together. This all looks tidy but ideally can input things in a structured enough way. But there's more because once this information and the content is tidily structured then with just a bit of magic you can put that through an API and you can put good stuff coming in because it's going to be just about the presentation because the content is very accessible with the right end points and right connections and can be delivered, for example, in many languages. You just need to plug in some translation pipeline like Google Translate or IBM Watson in the middle. Well, just. You can make your content snack size for people on their smartphones or full-flagged features for people on desktop. And you can make it queryable by a new fancy Facebook like bot. So that, I want to show you a great example from people at structured stories. This is a story lying about abuse in New York prisons and I personally don't know anything about that story but this is all presented simply as a bullet list of events or articles if I want to read a bit more with more or less detail or it can be shown as a timeline if I have a visual mind or structured story that is going to surface the concepts and relationships we've got in that data set and that allows us to create that view of cause and effect of these events. And that's really simple in a way. This is CMS in a triple store. So, BBC R&D have been doing something relatively similar that's called the Elastic News Project and it's a way to deliver the same piece of content with valuable depth to strengthen our audience's understanding of the question and essentially what it means is that the user will be able to dive into the piece by clicking on some stuff and get more information out of the same piece of content. Variable depth, variable length. Similarly, we, as BBC News Lab, we've been looking at shooting up content with some inline immediate explanation thanks to some information we already have structured in these other people places and organisations I mentioned earlier. So you can see this is what will happen next this is story lined with some timeline information. This is biographical information we've got for Patrick McLaughlin and this is indipipedia, it's just structured already. Now there are lots of things that can be done with a content that's made available through an API right. R&D have a large work stream called object-based broadcasting and the idea is that the content is a set of individual assets whose relationships are described by metadata and that's not too dissimilar to what I showed you about structured story in that little video. And content, thus this way, can be readjusted, reversion, made longer, shorter and even exploreable by the audiences. So we've separated the content from its delivery in consumption and we've structured this ideal content around the idea of events. So the possibilities are infinite almost and certainly appealing but if you think about it, it also makes sense that it's just a tech fantasy. It makes sense because news never stops. It happens all the time. It's a continuous flow and an accumulation of events and facts and as I said, stories themselves develop and evolve and we have to follow this. The only structure that makes sense is the narrative structure of us, a spoken or written account of connected events, the story and seeing this is wikipedia and says connected events as well. We can now invest our efforts reporters and journalists into the creation and curation of such narratives because we don't have to repeat ourselves in the writing of the articles. This narrative will make sense to somebody who stumbles upon them because we think and understand the world in a narrative form. But what led to this event as the reader? And a click later with such content, the narrative expanded to include another causal and helpful event. That makes sense now. I get it. I get why things happened. Now these narratives are not frozen. They're constantly evolving as stories develop. They can even represent the different views of the world we have. For example, Chancellor Merkel thought it was only human and decent to open Germany's doors to Iraqi and Syrian refugees earlier this year. Across the channel, Daily Mail again, thinks his madness. Same facts, different narratives and different stories. Now behind the scene, and I would like to quote Jackie Mauer and Paul Risen. They've written the manifest of structured journalism. Search and database of knowledge which already exists in the collective knowledge of our staff could be used to provide context at scale across all of the output. And important bit, structured journalism is a way of preserving a reporter's expertise so that it isn't lost once as published, but can be surfaced in related coverage. Now one more thing, pushing linked data was a successful and bold move in 2010. Now almost all news websites use some sort of links and linked data and their stuff, just to power their own categories or recommendation engines. And ontologies are also widely used to put some structure in the prototype of the content we have which is basically a headline, a summary and then a full piece. And that's one we have in this room and the principle of good data. Let's start publishing words in a while and just have a conversation around the structure we want our content to take. So it fits actually a digital world and the internet. So thanks so much for sitting through this session. I hope it makes sense and I hope you see some potential in what is realistically the foreseeable future of the news industry. I think we can take some a... I don't know how much time we have. Go for it. Go for it, Jo. Yeah. Are you interested in working on your linked data thing in the context of a much wider linked data context? I have an example. At the Crossref we assign DOIs to articles. And the URL for an article can change in different places so a DOI is a kind of an official. Identify for something which redirects. And if not all publishers are perfect but they're about to double in core or canonical links or whatever identify as saying this is the URL we should use. We don't get articles cited a huge amount on BBC News but it would be good if BBC News linked to the DOI so that link will always work. Is that something you're interested in? Well I can definitely put in touch with a lot of people. I think it's important to think about how we do URL. I thought we had a fairly static solution so links don't expire and disappear after one. But we can definitely look at that. We're just on the context of doing things with a wider linked data community. So we, I describe DUSA and DUSA is based on the EBP but we're doing some other stuff which is behind it. This is fascinating stuff. It's absolutely awesome. What kind of tools do your journalists or researchers, people on the user side of this data you're storing do they have for example to you know there has to be some process of evaluating the data. Somebody has to say how reliable is this and should we revisit this or all sorts of qualifications and I want to pick up these 5, these 10, these 20 events in my story and bring them in and link them in. What kind of tools do you offer for these people? So they, we have a whole team in charge of the team that maintained the linked data platform and sort of structure and that's journalists as well as editors and information architects. So that's stuff I've described 2010-2012 up to 2015. We publish new stories every day so we make sure journalists tag things correctly. It can be quite a challenge and that's why it's still a very human centered process, let's say. For the idea and the concept of structured journalism, we're not at the point yet where we have actually tools reduced, I think the most advanced product is one of structured stories that I showed. I think this is a big challenge in terms of UX and design because it will need to make sense to producing a 400-word article for a living. So it will need to make sense to them. We'll need to make sure we can actually validate and evaluate that data set as you rightly say. And then how do we exploit it? How do you die through that trove of data will be quite a challenge. Thankfully, we'll have relationships and relationships between properties and concepts. So that should hopefully, that's it. And Hones will sort of suggestion a bit of concepts you might want to have here. Do you go for something? So the first thing that was wondering, because I am contributing to the Wikipedia thing, do you use now at all? Do you use Wikipedia or is it on Dbpdo? So I will need to get back to you on this one. I think Dusa is only Dbpdo now. But I can't remember why the choice was made not to use wiki data. Is there a way of seeing at least parts of the source code you were doing for the article generation on the BBC? So for the article generation which bit do you... So at the moment we're focusing on efforts on the CMS and the sort of backend itself, so that's creating an option that can be handled by CMS with good principle of UX as we were talking just before. I don't think we'll see automatically written stories on the BBC for any time soon. The assembly of these storylines and the pages that contain these things can be a bit different and that will be done at the source, I reckon.