 We introduced Sanjeev Kumar Biswas. So Sanjeev, are you here or are you all mic'ed up? I'll give a brief introduction, which is that Sanjeev's got over 19 years experience working in the software industry. He'd been at Adobe Systems for 13 years and worked as a lead architect on many other products and initiatives. He was an Adobe Distinguished Inventor and Adobe Founder Award winner. And he's also filed 23 patents in the USA and more than 30 filed in the US Patent Office. So Sanjeev asks how AI can help in automating all the manual work in journalism and publishing. This is representing Singapore Press Holdings. And how AI can help to automatically translate articles from one language to another. And so he'll cover these initiatives part of your portfolio. Thank you Sanjeev. Thank you. Thanks for inviting me. This is really exciting. You know I'm here to show and share what we are doing at Singapore Press Holdings. But before that I would like to just by show of hand how many of you actually read the physical papers, the newspapers. One, two, three, no, SPH employees are not allowed. So one, two, three, only three, four. So that's why you see now we need to... And how many of you actually read news on mobile phones? See, oh my God, everybody. So you know that that's kind of known to everybody. And we need to see how SPH can transition from being print-only newsroom to digital newsroom. And that's where this whole digital transformation is happening at SPH. And I'm here to share with you all the things that we are trying to do. So I think I have already been introduced. The one thing that you don't know, I have no background in media business. No background, zero. So I was in Adobe for 13 years and I was more into R&D research and development. So I'll quickly go through the introduction of Singapore Press Holdings. So as you can see here, we have a lot of publications, like English, Tamil, Malay, and Chinese. And our flagship publications are Stratestime, Business Times, and Zabao. Zabao is very popular in China. So apart from publications news, we are also into magazines, radio, outdoor media advertisements, and of course property. Don't ask me why. So that's what keeps revenue coming in. So we are into multiple vertical business, but our core strength has always been news. So I will share how open source is helping us to go digital and what are the things that we have done already and what we are going to be planning to do in the coming few months and years. Before that, I just want to give you very brief highlight of the landscape or where we use the open source. Drupal is the core of it. I will tell you how. If you see around Drupal, we have frameworks, tools, which is mainly for machine learning and AI. So I will go through that one by one. So I will show you three use cases which we have done already, something on and two, it's something open. We are pursuing that. So let's talk about this. So Drupal as most of you know, it's an open source and there is a community around it. And in SPH, we have... Drupal has two different types of offering. One is where the editors go and write down the stories for web publishing, right? But as you know, SPH is also into print, right? So the Drupal system that we have used is actually also handles the content written for print. So we have a different CMS for print and so that's where all the stories start. So the journalists or editors will go there, write their stories and that would flow into the Drupal system and it will be prepared for the online version of it. So it doesn't mean that the content written on the print goes as is. So a lot of edits happens and a lot of changes happens. We insert a lot of images, videos appropriately, which we cannot do on the print side, right? So that's where the Drupal integrates with the print sub-system. And so if you can see the list here, we are using Drupal for our both EMTM. EMTM stands for English Malay Tamil media. CMB stands for Chinese media. So we are using Drupal for both this group. In Drupal, we are using some of the custom modules like how to handle print. I just explained how we are handling print content. In the models, we use the views, tools and panels, discuss it on media. And we also try to contribute to the community. We write components, customize it, and we try to share it with the community. All right. So the next one is article recommendation system, right? As you know that if you are reading an article on, let's say, Donald Trump's delegation under Donald Trump's, and you would like to, as you go through the story, you would like to know more about the background, you know, and then you would like to see that being recommended upfront. You don't want to go to Google and search for articles, right? So that's kind of not the right way to do. So what we have done is we have built a system where we collect a lot of content. So we, there is no shortage of contents because we create contents, right? So we take all the contents and using this open source, you know, all the framework that we have, we first clean up the text, get rid of all the stoppers, do lemmatizations, and then build a model. So there are different ways of building models. So some models will actually focus on the entity, like who is the main entity being talked about in this article. Some model will focus on a theme where we use LDA, latent-decreate allocation modeling. It's basically combining all the articles together. It will do the clustering, and then for each cluster, we'll have the common articles club together. So think about it like if you are more interested, so each cluster will be specific to a theme. It's like at a thematic level. And so that's the model that we have built. And once the model is built, we use it in the production system where if you go through the statesam.com, you read an article, we feed that information to the system, and it triggers the output. And so as you can see, there are so many articles, right? How do you decide which articles to recommend? So that system works in two levels. One is at the bottom level, which gives you the entire cluster. And it has all kind of articles, right? But from that cluster of articles, you want to extract a subset out of it. And we don't know which subset makes sense, right? So we run a lot of A V tests. We run a lot of hypotheses around the trendiness of articles. So in that cluster, we'll pick out those articles which are trending. We may also pick out those articles which are kind of, could be, random distribution, right? And then we surface it to the user. So it works at a two level, you know? First, you give, you get the whole cluster. In the second level, you pick up, randomize or do some heuristics to pick up which you want to surface. So let me show you a demo. So it's all on production. So I'm just opening this article. And try to ignore all the ads and images on the top and right side. So if you read this article, this is more... Oh, you don't see anything? How do I... Do I get that? Thank you. So I'm going... This is statesend.com and I'll just open an article. And this article, if you can see the headline, you know, something about women entering politics, right? And so ignore all these adverts. Don't ask me why. These are so many ads. So in states times, we have allocated some section at the bottom for premium contents. You know, ignore that here. So this is the one. So here you see it's surfacing articles which are closely related to the article that you're reading, right? Even the image looks the same, actually. It's the same person, actually. So that's what the recommendation system is. So now we'll go to the second one. The second slide. What's next? So AutoTiger, what does it mean? So imagine you are an editor or a journalist, you're writing a story and you want to make sure that your article is searchable and it faces quickly to the consumer. So for that, what happens in the newsroom is when the editor is writing the story, he has to basically append some tags. It's like a metadata. So tags like is contextual and it's earlier it used to be totally decided by the editor itself what kind of keywords he wants to attach to each article. So that's kind of very manual and incorrect in some fashion because each individual has a different way of thinking and so that keywords that's getting attached leads to a lot of instability. So we build an AutoTiger using Magpie, get a sense of flow and spacey, you know, also open source. And as I said, we don't have any shortage of contents. We have a lot of contents. We also have historic keywords being attached. So we use that information to train the model and it's like a multi-label classification model that we use. And given an article that you write based on what you have written context entity and, you know, keywords, it will suggest those keywords and entities also. Now it's easier for the editor to select or deselect, drag, drop whichever the keywords he wants to use and then publish it out. So that works as a search criteria. So it's like a search criteria keywords that people will use on the web. So let me show you a demo. So this is a standalone application that we have. All that I need to do is just copy the article, any article that I see. So let me pick this one. It's something I found today, something on impeachment of Donald Trump or something like that. So let me copy. That's all I have to do. So there's some more. Is this something more? Just ignore that. I don't want to copy all of them. So imagine that the editor has written this article, right? All the journalists have written this article. Now before anything gets published, before this content gets stored in my CMS system, I need to tag it. So just hit this tag button. And it surfaces or suggests some tags like Donald Trump, which of course the article is all about, something about Twitter, elections, 2016. But there is something missing. If you have read this article, there is, let me come back to that. So entities, it shows you the entities and also the keywords. So now the editor can just drag, drop and seal it and add the tags here at the bottom. But there is something wrong. If you have read this article, there's something missing. Any guesses? So you don't see the word impeachment here. So what does it mean? So it means that we have to work even more harder to build a good model, right? So that's an ever-learning process where we continue to build our models and make it more efficient. So all right. So that's the auto tagger. All right. So now the next two items that I'm going to show share is basically something which we are trying to solve. We haven't started yet. But you know, it's a, you know, if you are interested, if you want to join our hand, join our forces, please, you're all welcome. What does robot journalism means? You may have heard about it, robots writing stories and based on some raw data. So we don't want to do anything fancy, you know. So the pain point is that you have this COE data, right? A COE is that for vehicles, you know, every week there is a bidding happens on the vehicles certificate in Singapore. And that's like structured data. Also the SGX data, the Singapore Stock Exchange data, you know, which is kind of structured and also unstructured. Unstructured is free. You have to pay extra money for structured data. So can we use this structured numerical raw data, right? Something like this, you know, it shows you the month, which category of vehicle amount, you know, how many bidders, you know, on the think quote. Can we use this numerical data and write a story which looks like this? Because right now what happens is that whenever this report comes into SPH, one journalist from English will spend time studying this data and writing that article. Not only that, someone from the Chinese department will look into the same raw data and write the Chinese article. So you can see two people spending time on the same data and coming up with two different articles. So can we do something using all the open source that we have, build new models to convert raw data into, you know, readable text? Not only that, can we use some model to convert the English text into Chinese and make the life of our journal is very easy. So the last one is fake news. I was debating whether to put it here or not because we create news. Why should we be interested in fake news, right? Fake news are important because of two reasons. In SPH, it's not just we create the articles, right? We also consume articles from other sources like New York Times, Washington Post, and we publish them at our site. That's one thing. The second thing is our readers. They also want to get some confidence when they read our stories, whether it's true or fake, right? So how can we address this problem? So there are some open sources available. Some are dubious, some are kind of, you know, exciting to know based on the result that they have shared till date. Can we do something about it and build some kind of system where we can show and convince our reader that the text, the articles that they're reading is actually genuine. You know, it's coming from a genuine source. Contents are all real. There is nothing fake in it. So that's all I have to share. Open two questions. Can I have a timely speaker? We are, of course, wholly behind schedule. So let's take one question while the next setup occurs. Any questions? Oh, come on. Good. So the question is why Drupal, why not WordPress or some other, you know? So we tried WordPress. We used to use WordPress in the beginning. But we faced some issues around, you know, it did not scale up to our expectations. So we moved to Drupal. And we also got support from Drupal community. And that's what we have been using very rigorously. And, you know, it's been working well. But, you know, in the media industry, we have this problem of print and digital, right? So there is no one CMS which does both together at the same time. So everybody has separate system for print, one for digital. So we are thinking, debating whether to build or do something about it and have one CMS which does both. All right. Well, thank you very much. Thank you. First time I've seen what goes on inside the newsroom.