 And I have a great pleasure to introduce Matt, who will be speaking about news valence Yeah, I guess that's it. Take it away. Cool. Right. Let me just set my timer going There we go, right cool news valence an underview so What valence now so basically like? It's talks kind of a lot about the idea of of surveillance But also kind of more about how you can do it how you can do it back So there's a concept called surveillance So both words are from French sir is above Sue is from below and it's the idea of of Basically surveying as a participant. So like say if you're wearing like a body cam or something like that You could be said to be like surveilling news valence because new his French for we it sounds like news and also, you know, like jokes aren't funny if you have to explain them So moving on related to that. Who's this clown? So I'm Matt aka Grimwear I I've worked in like sort of systems and infrastructure for about eight years and And this year started working in infrastructure security. So I'm quite interested in data security and privacy Other than that, like I'm absolutely not qualified to be telling people About media and news and things like that. So absolutely take all of this with a pinch of salt I'm also a consumer professional and wasting my own time So if I only live to serve as an example to others about how not to do it, that's fine, too So this talk Basically, it's it's kind of about like sort of aggregating news. It's about like being able to analyze news and just sort of It's a kind of a personal struggle to stay well informed So to start off with them I'm just gonna talk about like what was some of my influences in in actually kind of taking this on because I think it sort of Gives some good context as to what exactly I was trying to achieve So one of my one of the one of the big influences for this is Hunter S. Thompson and this is partly because Well, I mean, I'll start with quote here He said I don't get any satisfaction out of the old traditional journalists view. I just covered the story I just gave a balanced view Objective journalism is one of the main reasons American politics has been allowed to be so corrupt for so long You can't be objective about Nixon And I think we can probably think of other American presidents that you can't be objective about But basically it was this idea that The way that he the way that he reports stories or reported stories He he was fundamentally involved in it There was a lot of his opinion it was very much on his sleeve there was also a Questionable amount of actual fact in it But the interesting thing about this is that so he wrote a book called fear and loathing on the campaign trail 72 It was about the McGovern campaign and the Campaign strategist for the campaign on the back of the book There's a quote from him and it says it was the most accurate and least factual account of that campaign And I really like this idea that it's kind of it's kind of talking about narrative there You know, it's not so much about the individual points that happen as much as the the The overall flavor of it and I also really like the idea of of knowing of a of a journalist or a reporter That you kind of come to know them through their work So you can sort of assess, you know, do I believe this bit? Do I believe that bit? So moving on and another influence here was spider Jerusalem So this is from the comic Trans Metropolitan. He's heavily based on Hunter S. Thompson But basically with more of a conscience and One of the things in this comic is that it's a very very small part of it but there's references to to basically a news feed called the hole and it's run by two people and One of the characters says to another It's it's like news but with all the bias boiled out And I thought about this for a while and I was thinking like hey, that's really cool And then I thought about it a bit more and I was like Two people editing all the news and they actually managed to remove bias like that seems really really nice in a sense And also, I think it removes a lot of information because Quite quite a while ago now. This is BBC news article Why Russians watch TV news they don't trust and I was just blown away that the journalist Was so surprised that people would watch a news source that they don't necessarily believe The point being that there's a lot that can be inferred when somebody lies to you Like lies show intent if even news if a news outlet goes around just reporting actual facts You can't infer intent from that, but if somebody is in the news one week But suddenly we've stopped talking about them the next you can infer from that that somebody wants you to not be looking in that direction And I was thinking about like, you know, how do you how do you aggregate this sort of information? and and this idea again of surveillance, you know So the NSA had had the program prison and one of one of the parts of that was the tool x-key school and This is kind of like that that whole idea of well, you know Why do I want to have a view of the world? Well? I think it's because I want to be able to see, you know, what that intent is like who's doing what and It's that idea of kind of surveying Surveiling back An x-key score is basically like heuristic scoring of people under surveillance based on criteria And I was thinking hey, well that would be really cool to do something similar with with news and do like Natural language processing and sentiment analysis of the different news outlets and actually gather some statistics, you know Have alerting on particular kinds of on particular kinds of stories And it turns out that that's really hard So my initial attempts really sucked to be honest because I was trying to Attack this problem with like, you know I'm gonna just scrape or scrape all of these sites and then and then you know sort of kind of process them and And it was just really difficult But that got me into thinking about why is it so difficult and and I think that that's It's a really big part of this. So just looking at what news even is and Caviar here Free press is good I'm criticising but but you know, it's the best system. We've got at the moment But basically news what what is news? So it's some events that purportedly happened that someone has decided you should know about And and that's pretty much as far as it goes because I mean the premises obviously to keep people informed Why why do people need to be informed? Well, it's an important function of democracy if you're gonna vote for a representative You need to have some understanding of the policies and everything or you know You can just vote for the worst person in the world as tends to happen most But it can't be done by a government in in a real democracy because ultimately if they're shaping your view of the world That's gonna turn into a terrible feedback loop So it has to be done for profit effectively, it's left to the private sector Which doesn't necessarily? lend itself to To necessarily having the right motives and the right incentives And it leads to a lot of people like like butternut Hitler here saying fake news a lot I'm really really pleased actually that the screen was on that side. It was a bit of a gamble 5050 Yeah, so basically fake this whole idea of fake news This is a really interesting one because you know, there's real fake news, which is like, you know saying that Democrats are running a paedophile ring out of pizza Pizza rears in DC, which is just totally false Then there's kind of real fake news which is largely like selective truth with heavy bias Non-sequitur interpretation and this stuff's really really difficult to nail down And then you know fake fake news, which is basically things that you don't like other people saying And we run into real problems here of interpretation, especially with different news outlets having different financial backing different financial motivations But one of the things to take away from this is all of it is data You know, you can you can use all of that to interpret sort of some I some idea of motive But there's also a lot to fall afoul of here in terms of the way that we share news the way we communicate and we talk about it You know, how can you not be part of propagating the wrong stuff? So one of the lessons I learned in all of this is that if you don't care enough to do your research You don't care basically if you want to share an opinion on the internet Make sure that you actually know you know the subject you know your sources because ultimately the internet doesn't need more Informed opinions and if you can't be bothered to look around you probably just don't care that much So one of the ways one of the ways that I try and be well informed is you know I've got particular people who who I follow and the grucks one of them. He's he's really excellent He does a lot of stuff on operational security And this is me tweeting about Malware tech when he got indicted by the FBI in the US And I was just saying you know rather than spreading my hot takes I'm gonna wait for the gruck to say something so that I can parrot his opinion as my own and he replied to me I'm waiting for Oren Kerr to do his post Oren Kerr being a law professor So it's this this whole idea. This is another thing. I learned you know to be considered well-informed Stand on the shoulders of giants like having that set of sources for the things that you're interested in is really really great But then again selecting your sources is hard because actually human beings are really really terrible at perceiving things as they are so There's like, you know, like absolute objectivity and rationality are absolutely lies From the point of view that like we're more likely to believe things that confirm our own beliefs Confirm our own tribalism and our identity Or are just convenient and we're also extremely resistant to cognitive load You know so like we like tweets because they're pithy and they sound true And you don't want to dig into it because you don't really care that much But you like to take on those those really quick easy ideas like why aren't there any jobs must be immigrants You know or it could be the government's not working hard to create jobs, you know You know if something becomes nuanced we we automatically our minds just kind of slip off of it We don't want to deal with it But there's something that makes this so so much worse and it's social media The idea is pretty rife that engaging with an online platform in order to to change the world is a thing I don't know if anybody remembers Coney 2012 as the counterpoint to that But basically don't argue with people on the internet and don't share news on the internet It's it's not actually really very helpful. It doesn't convince people. So like the whole Cambridge Analytica thing They converted a lot of people who already believed in things that you know, Donald Trump was saying or already Believed things that leavey you were saying they just managed to convert them into voters They didn't change anybody's mind All this does by engaging with these platforms is generate advertising revenue and Advertising is a really fundamental piece of this whole puzzle, especially as to why it's so difficult to aggregate news so like Advertising is basically the trade of attention and trading attention as if it's currency so The the old school way of justifying it has always been you know Introducing products to people who may want them and that's that's fine That's absolutely fine, but a lot of the time it's more about generating desire for products and people who are trying to give their attention to something else especially if you look at like content industries like and It's especially legacy media. This is this is especially obvious because in radio, you know You're listening to some music and then suddenly someone's telling you about double glazing TV, you know, you're just you're just trying to watch some nice TV and it stops so that So that somebody can tell you that you're fat you're ugly and you smell So that you'll buy things And it's and it's pretty insidious in that sense and it's especially obvious when you watch things on Netflix That have been on broadcast TV and you're like why why does it stop and then start every now? Oh, that's where the advert breaks used to be so all of this content is created to get you to watch the advertisements and this this gets even more interesting in in the future Because you can prove to some degree that a particular individual has looked at an ad or clicked on an ad on the internet And it can be done without user content You know tracking people around the internet and it's mostly offloaded onto ad delivery Networks so using like off-domain content and scripts and building usage pants to sell on And you can start to see why this is all about driving up engagement and the problem with With just you know doing these things to drive up engagement is it's not about keeping people informed anymore It's literally about making sure that people stay on your website So that's kind of the background as to as to why it was a difficult problem to start off with And you can see it's all pretty pretty loaded and pretty difficult to solve So in terms of like taking back control I Basically, I've got a few mantras that I that I used in in sort of all of my approaches So I'm just gonna just gonna list them off before I do a few demos Basically, like the first one is nobody's gonna call the internet police I'm not advocating anybody do anything illegal just kind of annoying for ad networks Additionally, you might have to live with Google captures for a while if you do some of this wrong Secondly You you're good at producing metadata people are very good at producing metadata. So never do it for free I mean, it's effectively got an equivalent cash value, right people buy this stuff people sell this stuff So when you give data to someone you you should make sure that you're actually getting something worthwhile in return and Failing that mantra three is Admiral piss This is from a friend of mine who also coined the term butternut healer And it's basically the idea that Somebody said to her well, what about when somebody already has your data? You know, how do you get the piss back out of the swimming pool? And my friend said just add more piss, you know, just obfuscate. I'll show you what I mean Demo time right I'm gonna have to try and do this And did so this so this particular demo is about People kept sending me links to the times.co.uk and I kept getting very annoyed that They had a data payroll. I'm probably gonna need both hands for this bit so you can see William Martinez has decided to sign up for the times.co.uk and It's just about to click get access and I'm not doing this So you can see this is this is failed because it's like users picked an invalid email address That's because I don't want the network to start getting captured But I was previously using this before they had any Validation there like every time that somebody sent me a times.co.uk link I would literally just run this script and it would load up a fresh incognito browser And just sign up as someone totally fake This is the kind of the idea of the the Admiral piss philosophy is basically if somebody wants to put up a data paywall Give them data, you know data is not information. Just give them data Doesn't have to be real Where have I gone? So one of the things you can learn from this and one of the things I learned from this is It might be your website when it's in my browser. It's my code So the way that I went about like doing that particular thing Was so so effectively, you know all of these websites they want to keep you engaged so they've got lots of This gift gets me every time Yeah, you know that they want to keep you engaged So they They want to make it pretty and the way you make a website pretty is you apply cascading style sheets And it's done using the document object model Now part of the thing about this is that it means that everything on the website is effectively tagged with what it is And in order to be able to you know, keep good Development velocity and everything like that you try not to change that structure too much If it's not consistent is held for developers and this is an advantage for us because we want to aggregate news We don't care if it's pretty or not. So effectively it means that we can find that Content, right? So, let's see I can follow this link here Seeing the wrong time Over here, all right, cool. Okay, so What we've got here is just like a BBC thing. It wants me to agree to Cookies and I don't want to I don't want to look at that either So we can just whoop Delete that I used to do this with ad block pop-ups, but they don't tend to do them anymore You just like delete the overlay and then add on scroll stuff and then you can just read it anyway so What we want really we want that to go away as well It's we want to be able to just pick out the actual news here. So what we can do is actually select Where is it? Where is it? Where is it? Story body? Here we go So this is actually the section of the the section of the website We're interested and you can see it just highlights the whole thing that main header image all of the text So what we can do is actually take this Yep And copy that over no not our one this one And we can go So we can actually just fetch fetch that website from the command line, right? And then what we can do is we can do stuff like in fact, you know what we can do is we can cheat and use our command history So what I'm doing here is Just telling it to select out the bits of the document object model We're inside story body and what we can do is go even further and just say I just want everything in the paragraph tags and everything that's text inside of that And what we've got is just the article now So that's that's pretty cool. That's that's something we can do to to help in just getting our news out of our news websites So yeah, one of the things I learned in all of doing this is that terms of service hinge on you're being able to retract service The first thing that anybody will serve you is the actual content They'll let the ads load later. So you can use that to your advantage and actually just only request the main part of the content So it means that when people say like, oh, you have to agree to this you have to agree. Well, I've already got the text. So Don't really care So we can get the the actual news out of the website, but how do we get a feed of it? So there's a thing called RSS and also atom. That's the RSS schema right there. You can see that there's like a Channel described and then multiple items within that channel. So they can be like links news feeds So it's easy to generate and atom is more different from that. It's kind of the same sort of thing So basically what does this look like? No Sorry can't type when people are watching there's quite a lot of people watching Okay, so this is this is my news feed reader is why use and I'm like, okay Well, we've got this feed of news. Well, what's this guy talking about? We don't have any news at all. This is actually all just Links and truncated content because people want you to go to their websites. So it's pretty useless Pretty useless So most people just provide click-throughs so that they can get ad metrics to get you into their click-hole and also like a lot of A lot of there's a lot of misuse of the schema for RSS as well. So when I ended up having to write my own RSS library It was largely I didn't actually look at the RFC at all because I was like, well, sure there's an RFC But doesn't mean that the people producing RSS feeds have ever actually read it So I was just going on the basis of the actual data I had But hey, you know now we've got a bunch of stuff that we can actually add together So we can get a feed of articles. We can extract the article body from a given web page and for an added bonus We can dump all back into the RSS format And use the same reader So that's what I did. I made better feed and it's written in closure. I'm very sorry. I'm not that sorry So demo again This is really actually begging for trouble. It's quite right Window which one which one which one? Not that one Not that one That one. Oh, no, that didn't work Yes, this is it. Okay. So, I mean, I'm really sorry This is actually the interface for it at the moment just emacs because I'm awful But if effectively like what I've got here is you can see on the left hand side There's a list of feeds and then I just will Run it and it'll run in the background and basically spit out all of those those feeds So go through each feed take the article content spit it back into the feed and then dump it out on disk So what this actually looks like in practice? is Nope, no, no, that's not good So we've got exactly the same feeds as before and that hasn't loaded properly two seconds Sorry, this is very shonky Okay, that's all one hasn't loaded never mind So it's the same feed as before but actually we've got all of the article content in there And this this particular feed is actually a list of different websites So I've got a whole huge list of different CSS selectors for that So, yeah, like what does it do so it's got Sorry slide management So it's got a few different features is per domain selectors. So if you know you visit bbc.co.uk or whatever You know, you can say for BBC you pick out this bit You can configure domains that will do redirects. So like aggregate news feeds that will just redirect to a different site Configurable user agent. It's pretty much the only thing you really have to lie about is like, yeah I'm actually a Chrome browser. Don't block me And you can also use Chrome in headless mode for sites that have JavaScript DOM manipulation Which is really funny when you accidentally leave it going in the background and your laptop barks at you because of autoplay videos So batteries aren't included on this because if I were to distribute The CSS selectors people would probably get a little bit salty about it So, you know, I shouldn't be showing like all of this sort of stuff on on line But yeah, what this doesn't address though is the Is this whole idea of being able to look at pans and I'm not very far in it yet But I have made some progress. So I've got this thing Fedex Which would take any RSS feed and dump it into elastic search Which then means that we can search Search through the news articles that we've already looked through and I can run it on my reconstituted feeds as well and It can additionally spit out more RSS feeds. So you can actually have a standing search for something Right. Yeah, there's absolutely a non negligible chance. This one will fail make What was it? search Womp Okay Reload those so you can see I've got search results here for Apple This is just all news that I've been reading and that I've indexed so it can now search for anything that's got Apple in it We can also search for EMF and there was a hackster IO post about it about the badge. It's pretty awesome So, yeah, that's that's pretty much how that works And I would love to do loads more with that So like I'd like to start doing some like sentiment analysis to be able to score different news sources on like, you know This news source speaks negatively about Immigrants 90% more than all the other ones that sort of thing So statistics per domain like maybe some like natural language processing so that you can instead of just doing a text-based search You can build a narrative Throughout of what has this person been doing what is this organization been doing? Also better relevant scoring because I mean like it's just a basic text search at the moment And an actual actually usable interface because not everybody wants to plumb about in emacs and closure Yeah, I want to leave you with like the final lesson That I learned is basically if you don't control your information diet someone else will and the content industries are actually pretty insidious about this So it's just one of those things to take in mind like if if it's not really really important And you're not gonna do your research like Don't just take it at face value, you know, treat it with some skepticism and control what you Control what you let into your brain the same way you control what you let into your body Thanks for watching