 And for our first talk, actually we will have a very experienced speaker, Ben Nuttel, oh with us, so welcome Ben, hello. I'm pretty sorry, I've only just got audio back, I didn't hear it. Oh, that's all right. We just want to know, like, so yeah, I guess you're getting your presentation ready. And so you're calling in in UK, right? If I remember correctly. Yeah, based in Cambridgeshire, yeah. Yeah, so it's amazing that you got to tell us something about the news. So I know you work for BBC News Lab and then I always want to know how it operates and how people can, you know, know about this news and stuff. So yeah, when you're ready, I'll let you take us away. Okay, great. Thank you. Can you see my slides okay now? Yeah. Okay. So yeah, thank you for that. And welcome everyone. So, okay, so my name is Ben. I'm a software engineering BBC News Labs. I used to work for the Raspberry Pi Foundation, you may have seen me speak about Raspberry Pi before based in Cambridgeshire in the UK. And you can find me on the web and on Twitter and GitHub. So a bit about BBC News Labs. So we're an innovation team within BBC News and BBC R&D. We build prototypes of new audience experiences. We come up with solutions to help journalists and we do research. We try out ideas and do projects on that kind of basis. We write up about projects at bbcnewslabs.co.uk and we're also on Twitter. So you can follow us there. So I'm going to be talking to you about a project we did with the Radio 4 Today program. This is one of the BBC's flagship news programs on Radio 4. So if you cast your memories back to 1957, this is the year that the Today program launched. So back in 1957, the Today program I imagine would have been recorded on a microphone that looked something like this and would have been listened to on radio devices that looked something like this. And obviously it was just broadcast on the radio. There was no other options beyond listening to this live. So if we come forward to 2021, this is what the world looks like. So now people can listen to the program in a whole manner of different ways. So there's all these different ways that people can consume the content that we provide. They can consume it on their smartphone, on a smart TV or smart speakers. Obviously digital radio as well. But they also have the main differences that they have, the difference between, the option between listening live and listening on catch-up. So there's various different catch-up services that people can listen to. They can listen to the whole program after the fact, after it's gone out, or they can listen to clips of it or little bits or whatever they want to. And also a kind of a separate thing is that there's also things like Twitter conversations that happened around stories that are on things like the today program. So if some MP comes on or some minister is on the show as a guest and says something outrageous and everybody's talking about it on Twitter, people are often sort of, even if they're not listening to the program, they're seeing all the kind of fallout of what is going on, programs like this that can get very heated. So there's lots of different ways that people can kind of get involved in programs that we put out and things like that. Very different to how it was 50 or more years ago. So if we take a look at the production workflow for how the producing teams put together these programs. So they use a tool called Open Media, which looks like this. They use it to plan their running orders. So this contains a list of the stories that are going to be in the program with the order of the stories, all of the timing information, the script, the guest names, the media objects they've used and within the program and that kind of thing. They basically create a draft in this tool, and then they modify it as the plans change, as they confirm guests, as they bring stories further up the running order. They might chop and change things. And even when they're on air, they're still making adjustments and moving things around if one guest takes longer to answer a question than they expected and everything moves around accordingly. And they can still chop and change and choose what goes in. This is a three hour program, so they plan ahead but things might move along. And so this goes out on air. It's on from 7 a.m. to 10 a.m. every weekday. So they're in there, they're in the studio, they're producing it, it's going out live at 10 o'clock. It's all over and they can finally put this running order away because it's finished. And then shortly after the program's finished, shortly after 10 o'clock, this becomes available on catch-up. So on the web, on the BBC Sounds app and various other places, this will become available for people who want to listen on catch-up. And so the problem is this is a three hour program and the way that we, you know, in recent years we've started to make this available as catch-up, the way that we produce it, the way that we provide it to people as catch-up is just a three hour dump of the audio. It's just, well, here is what was on the radio earlier, this three hours of this segment of this program. And there's no information about what's in the program. There's nothing that doesn't tell you anything about the presenters, the guests that were on, the stories that were covered. It's got the same generic description every day. And if you hear about an interesting debate that was on the program, you know, good luck finding it. If you've heard about it on Twitter, there was this really good bit where a certain search said this, you're not likely to be able to find that within the program because you've got to scrub along that bar three hours long and try and find what, you know, that thing that you heard about. So in these labs we tend to do a lot of kind of what if questions. So we were thinking, well, what if we could actually take all that running order data that we started with because we had all that data to begin with. What if we could use that to enrich the digital offering that we put out by harvesting that data from the running order? And what else could we do if we did do that? So once we've reapplied that data to the content, what else does this allow us to do? And there's a concept within, particularly within BBC R&D called object-based media. And this is a sort of an ideology where you, you know, you can chop up programs that were made for linear broadcasting, you chop them up and provide them to people in different ways and allow people to consume small amounts of programs and have customized, personalized views of all these different bits of content. So there's lots of stuff to play with, but once you've got the, once you've actually surfaced it and chopped it up and made it available with all that data. And there's also, you know, as I said, lots of personalization options. So a really, really bog-standard, simple example could be rather than, you know, recommending to people, oh, have you thought about watching the Andrew Marshall or listening to the Today program or watching Newsnight or listening to the World at One? Instead of just recommending programs to people, you might want to give them more personalized choices in what you're presenting. So instead of the Andrew Marshall, you might say, here is a clip of your MP on the Andrew Marshall. Here is a section from, you know, an interview about Joe Biden, you know, that you might be interested in. Here is a clip from Nicholas Watt. You read some articles that he wrote. Here is, you know, here is our insights on the story about Dominic Cummings from, you know, that we talked about on World at One, that kind of thing. Instead of recommending programs, you could, you know, choose more personalized sections to recommend, which might be more fruitful. So how do we extract that running order data? So unfortunately, there's no export from this program, so it's not, this isn't a trivial problem to start with. The way that this works is something called the MOS Protocol, which is an industry-standard media object server. This is about sending XML messages back and forth between what they call newsroom computer systems and the media servers and the cameras, the automation, the auto queue, all the kind of things that are in the studio that need to rely on, you know, quick communications of changes within this running order. And so it's based on this industry-standard protocol. So we started to look at that. And for a given program, there'll be, you know, there might be several hundred of these MOS messages. So each of them is an XML file that describes what the change was. So what happens is they send, they create a new running order and it's a message saying, row create. And then for every change that they make to the running order, if they swap a story, you know, swap stories around in different orders, if they add a new story, if they add a media item, if they change, you know, one of the details of something, every little change that they make is sent, emitted as a different XML message according to this standard. It might be that one of the cameras needs to be aware of one of the changes, so it receives the message and it deals with that accordingly, that kind of thing. And so we're able to, we thought, you know, maybe we were able to piece the running order together from all of these messages. So if we take a look inside a couple of these MOS files, so this is the top end of the first file that you got, the row create. So you can see there it's got things like, you know, it's got a row create tag, it's got a message ID, it's got a row ID, running order ID and a row slug and details about, you know, when the program is due to start and that kind of thing. It's got all this data in XML format. And then, you know, the next message that might be sent might be this one, the row story send. So this is one where they're adding the body to a story so they might have written the script and they say, well, here is the script for that story that we started with. And every single message contains something like this that you've got to, you know, it has a particular structure and you've got to be able to understand how the structure works. So we investigated like parsing and merging these XML messages based on, you know, the details within the MOS specification and it really seemed to work. So we put this together and, you know, we wanted to be able to put this together into a solution that would take a whole batch of hundreds of messages from a program and condense it down into one big merged, you know, complete running order. So we were able to do this and we created, we developed a library, we called it MOSROW manager, which is the MOS running order manager. It's open source. It's an open source Python library. It's Apache 2 licensed and it's available on the BBC GitHub and the docs available on Read the Docs. So what can it do? So this library, it can, it merges MOS files into a machine readable running order. So if you've got a collection of files, it will merge them all together and produce one that you can use. It's easy access to the contents of each of the MOS files or the complete running order and it's provided as a library, so you can import it and do things to it sort of kind of custom and there's also a command line interface for the sort of basic general tasks that you might want to do. And because this is an industry-wide standard, you know, it's useful being open source so that other broadcasters might be able to make use of this as well. So the way it works is we've got MOS type classes. So every MOS file type is provided as a class. You can see there's a hierarchy there, the MOS file base class and then all the subclasses like story send and story move, they provide specific implementation details about how each of those types work and how they interact with each other. So you can create a running order object from a row create file and then you've got a running order object that you can play with and similarly you can create a story send object from a file like this and you can manipulate that object and access the bits of that object. Alternatively, instead of using the actual classes, the subclasses that you want to use, you can just use them, you can construct them from the base class and this will allow you to detect and classify the type for you so that you can much more easily automate these processes. So, and alternatively from creating from a file, you can also create an object from a string. So if you've already parsed an XML string from somewhere, you can just create an object from that. And what we use quite a lot in the VEC is the from S3 class method which allows you to construct one of these objects from a file in an S3 bucket. The way that works is we have, the base class has an init but you don't generally use that directly. You use the from class methods. They take different parameters according to what they are, the file name or the XML string or whatever and they pass that, they read the XML however they need to do that whether it's from the file or from the string S3 bucket. They get that XML and pass that to the, it runs it through the classifier and then that returns an instance of the subclass which is quite neat. So property access. So once you've got an object representing a MOS file you can easily read the data within it without needing to know where exactly it is within the XML. For instance, the row slug is a property here which retrieves the string from the row slug tag within the row create tag and in other cases it might return something that isn't the string. So the message ID is an integer and it parses the start time from the XML document obviously as a daytime object. So having things like that can be really useful. The duration is a float but that's not just read from the file. That's the sum of all the durations within that running order. So it kind of goes another level deeper. And row.stories here is actually a list of story objects which is another abstraction which I'll come into next. So there's this concept called escape hatches and ejector seats which is written about by Anvil on their blog which is a really good read. Basically when you provide an abstraction to something you have a choice to either let people use the escape hatch and still access the advanced stuff. If you need anything else, it's in here. Otherwise they would have to ditch the library and go back to parsing the XML themselves in this example. So what we do here is we provide row.stories and things like that. But if you need anything else, just poke inside row.xml which is the element tree that we're abstracting anyway. So then within something like within a running order, if you access those stories, so that's a list of stories, pull out the first one. And again similarly we've got further abstraction, so within a story you've got its slug, its duration and its script and lots of other things. The way that is defined is a property within the running order which says we'll go and find all of the story tags within the XML. And construct a list of story objects which provides that further abstraction underneath. So it just means whenever you access row.stories it's going and reading the XML but providing them in this nice accessible way. And again with the escape patch thing you can access the XML within a story even though we've wrapped it up in this story class. And so one of the main things I was talking about was merging those two messages together so as well as just poking inside and having a look at the contents you want to be able to merge them together. So if we for instance take a running order file and a story insert file, so a story insert will insert a new story into the running order. We can see here we start off with a running order that has 10 stories within it and if we use the plus operator here we're using plus equals to add it to the existing running order. We've now that has now applied the merge so it's taken the story insert merged it into the running order and now we've got a running order which is 11 stories long rather than 10. The way that works is we have done to add to an add magic method on the running order class and we have and then we have a merge method on each of the other subclasses. So that did that each merge class determines how or how does this particular type apply a merge and it gets called by you know you running the ad on the running order. So there's comprehensive documentation for this available so it's on Read the Docs and we also we follow something called the Deer Taxes Framework which is made by Danny LA Procedure this is a concept where you separate your docs pages between what are tutorials how to guides explanation and reference and it's a really good way of organizing and structuring your documentation so it's well worth a look. So how do we utilize this within news labs so we have a suite of AWS tools that we've written which you know comprises several lambdas and dynamo dv tables and things and we we use the module manager library within the context of these lambdas to process the MOS files for the programs that we have access to once they arrive in S3 so we have something that sends them into S3 that triggers jobs to be run and then we kind of store things like what we've extracted all the list of all the stories and these are all the episodes that we've processed and these are the different things that we've pulled out and then we've got all that information available in both in the completed running order that we've saved and passed on to a document store but also in all of our dynamo dv tables as well and once processing is complete we've sent that file over to the document store and then other people can subscribe to that and trigger their own events to say well once you've finished processing that thing we could actually run our own job and do our own thing based on that so anyone within the bbc might be able to make use of this as well and so if we take a look inside the the lambda so we've got this internal library called Mozpipe which is kind of specific to the way that we do this within the AWS suite this allows us to kind of have shared utilities for things like talking to the dynamo dv tables across all of the landers they all have a copy of this this internal library in the landers and we've you know the the processing lambda for instance looks something like this so we import the database client and the processing class from Mozpipe and we have two entry points which do roughly the same thing so the lambda handler is what is executed when it's running in the lambda context and the if main if name is main both of these create a processing object from some parameters and they and then you call that object to execute the processing task so those parameters have either come from the event context in which the lambda has been called or from command line arguments and this allows us to run the lambda equivalently both locally and within AWS so then if we take a look at the Mozpipe library so this is how you make an object callable like I showed there so we've got processor and you've got the brackets after it so calling that object is made possible by this call method so the magic method for call and essentially that's just a kind of a way of in this example it's providing a sequence of method calls which you could otherwise run separately but it's just a way of providing that all in one method and just making it callable which is quite nice so part of the suite of tools is populates the status dashboard so that we can monitor programs being processed and catch any processing errors and check when programs are stuck in pending if they haven't had the road delete file saying they've finished that kind of thing and this was crucial for ironing out the edges in the edge cases or in the merge implementations and gives us a high level view of what's going on we also have a programs directory which is right now in a similar way this allows us to kind of browse and we have this web app we can browse all the programs that we processed and view all the episodes that we have for a program and view the stories within the running order, the timing information, the script and all that kind of thing so you can see that here so we've got an episode of Newsnight there and we've got a list of stories that take place and at what point in the program the story happens and then you can click through to view the script for the whole thing as well so we ran a trial last month with Radio 4 today where we added chapter data to the BBC Sounds web player so you can see along here just for the final hour we've decorated this timeline with each of the chapter points for the main stories so this allowed listeners to using the catch-up service on the web to see what was in the program and select sections that they wanted to listen to so here's a list of all the chapters that it was below the player you could view this and either choose a story to start or skip over one or whatever you wanted to do you actually have full access and control to just dive around wherever you wanted to go to in there the final step of this process to get it into to be able to do that was a tool called Slicer which once it's been through the Mazra Manager processing this tool allows somebody from the Radio 4 today program team to correct the story offsets and move everything slightly if they needed to add custom synopses to the stories and replace things like the technical slugs that they use in the running order with more human friendly titles and then it went for publication from there so it kind of went through a system of preprocessing the files that we've got from the news and computer system passing them into this AWS pipeline with Mazra Manager sending the final document to the document store that would trigger a machine alignment tool called Hopfuzz which would take the transcript and make sure it was aligned properly based on the order of the stories and then passed into that Slicer UI tool that I showed for manual confirmation editing then it would go for editorial approval because the program team have actually got to be responsible for what goes out and what the wording is and all the details around that and then once they're happy with it they can publish it and we'd get that out by as soon as possible within that morning anyone listening to the today program over the last month or so would have had access to this and all the chapter points available so what next for this? One of the things we want to do with the library particularly is to gather feedback from other broadcasters to ensure it's compatible with other systems and it's not just this is how the BBC does it we wanted to make sure it kind of works across the board and I and I, any other edge cases that crop up we want to roll out this kind of automated chapterization including TV programs like Newsnight would be good as well we want to provide accessible program data from these running orders as a service within the BBC so other people can make use of this because a lot of people we speak to say they want to be able to do these things but they don't have access to that data and it's too hard to get it so this whole process of having the library and extracting it all and processing it all just provides that data and makes it accessible I'm planning a scope for more object-based media type projects demonstrating opportunities in this area like further personalization and that kind of thing and that's all from me I think there's just a little bit of time for questions Yeah, so thank you so much it's a really good talk I don't see any questions in the matrix but I do have the questions I want to ask because this is very interesting so I want to know what makes Python the tool that you choose to analyze of these data, like what makes Python the best language for that That's a good question my team has skills mostly Python and JavaScript and TypeScript things like that as well there are bits of the project all the way along which use different languages so some of the pre-processing stuff is kind of older stuff that's been around for a while and there's bits of PHP floating around and stuff like that there's bits of Golang script which kind of deal with some of the MOS messages that you come in and then we've got the slice of UI which is TypeScript but being able to basically when I started the MOS processing we were just looking at doing some file I.O and parsing XML and stuff like that and the tools and the libraries and the stuff available Python is very well suited for that type of general purpose programming so it's probably one of the lesser exciting bits of the whole project but it was definitely the right tool for the show I think the way I was able to do all the stuff with the class hierarchies in the essentially the main thing about this library is the nice abstraction and the accessible properties and things like that so I think that's the key thing that Python brings to this makes it a really nice library to use that's great, that's good to hear as well that Python is very useful in this project, in this processing so thank you so much Ben I think if people are interested and too shy to ask now then I think they would find you in matrix anyway so yeah don't be shy in the last questions thank you so much Ben, thank you very much I would like to thank you