 content warning. It's not so much about my content as it is about some of the Twitter accounts I'll be featuring on this. Some of them do have offensive language and sexual references. Some of them show alarming levels of intolerance or pseudoscience or science-analysm, and some of them also contain Australian politics. So, Twitter is a magical place. It is a wonderful, wonderful place. You find movie stars on there like Mark Hamill, Emma Watson, Kristen Bell. There are pop stars like Taylor Swift. There are complete randos who, you know, there's hard-hitting journalism. There are business leaders. There are politicians, former presidents. There are conference organizers. Conference organizers on Twitter. There are people who live tweet talks on Twitter. And there are a great, great number of bots. So, and bots are the reason why I got into this, because one day I decided to try and make my own. There's a whole bunch of fun you can do with this, and it's a great little toy project you can have a lot of fun with. So, before I get into my bot, I thought I'd sort of have a look at some of the other bots that you can find on Twitter. There are some that react to other people's content. Pentamatron is one that does that. Pentamatron looks for pairs of tweets that form rhyming coblets in Iambic pentameter. And it will retweet them. Accidental 575 looks for a similar kind of thing, except it looks for haikus. Although it does get tripped up by the hashtags, occasionally. You get other ones that are a bit more sort of reply-oriented. For example, Stealth Mountain tells you when you said sneak peak instead of sneak peak. There used to be one that would correct you if you said less instead of fewer, but apparently that's worse than being a Nazi. And then you get one of the most interesting bots that I know on Twitter, which is God Tributes. Appropriate Tributes will go and pick words out of what you've said and respond to it by dedicating it to a particular entity. In this case, it's just picking words out, but it also can deal with emoji. It can also look at images and work out to some extent what's in them, which is kind of spooky when it happens, but it's pretty cool. It can also, in a lot of cases, work out the kind of singularizing and pluralizing part of it for the most part. But this gets to the other kind of bot that is, which is just one that generates things, because God Tributes will occasionally just throw something out there. Magic Realism Bot does similar kinds of things. We also get Iced Tea from Law & Order SVU, warning us about various drugs. But one that I wanted to highlight because it's a useful one for this subject is Midsummer Plots. You're probably not familiar with Midsummer Murders. It's a British TV show which involves the rather murder-prone region of Midsummer in the UK. It's been going for 20 seasons, but it is delightfully twee, as you can see from these. But the reason I wanted to highlight this one is you can actually find the code for it. And so this is an example of a templated kind of bot. You have a bunch of murdered persons. You have some causes of death. And then you have a bunch of village groups that are angry at something that threatens something. And then you just call random a bunch of times and fill it all together. If you want to get into these ones, you don't actually even have to write so much code like that. There's a service called CheapBots Done Quick. CheapBots Done Quick uses a kind of meta language called tracery. You can see with this that you have a templated kind of thing up here where you can manipulate various things and you have the list of random things that go in there. And it will just roll the dice for you and generate things. And so that's a kind of very simple way of generating content for your Twitter bot. Some of the more interesting ones use more complex models like Markov chains and the like. And for some reason these are all called eBooks bots. Which is funny because horse eBooks, where they get the name from, was not actually a Markov bot. It was some kind of bot. It was originally a spam thing. And then a performance artist took it over and it became more delightfully strange. But kind of these single source or even multi-source Markov bots end up probably being called eBooks bots. The gruck is an information security person. He has an eBooks bot. You can see on the left here that this one is something that he actually said. And on the right is kind of word salad that sounds a bit like him. When you combine multiple content sources together, you can get something even better. Arrowid is a site which, among other things, contains descriptions of people's experiences on certain psychoactive substances. Arrowid recruiter combines that with technical recruiter spam. Which brings me to my bot, which is called Wint Nation. Wint Nation may not mean too much to many people here. Wint Nation is the combination of a bunch of Australian political figures and one of the weird Twitter accounts, weird Twitter being a particular genre of Twitter called drill or wint. So one nation as fronted by the lovely Pauline Hansen is an Australian political party that is the exact kind of reactionary right wing chuckle heads you think they are. And one day, part of the reason why this came to be is that they got a bunch of people elected in the last major election in Australia and a bunch of them have a spectacular capacity to self-own. Particularly on Twitter, there was one who was unfortunately no longer in parliament, Malcolm Roberts, who would regularly on Twitter just, yeah, face plant brilliantly, which would lead to him being sent this tweet quite a lot. And so I figured that given the drill seemed to exhibit a whole bunch of the same old person yells at cloud things that one nation tended to do, that they would make a good match and that we should try and mix their content together. So which brings us to actual serious bits. To do this kind of thing, you need a corpus, you need a body of data that you can feed through something in order to generate tweets. So how do we get that? Well, Twitter's got an API. It's actually pretty easy to use. The quick version is you click that create new app button, it will ask you a bunch of details, you fill them in, it creates an application and gives you a bunch of consumer keys that you can use. You also need some access keys, you get those by scrolling down and clicking that button. And then you get those tokens. And then you call the Twitter API. It's act it is that easy. And the nice thing is that if you've created a new Twitter account to run the bot, posting is equally easy. The other source of content that I use, I use three main sources of content for WinNation. One is the Twitter accounts of every one nation politician I could get fined. The other one was their Facebook accounts. Facebook, if you sign up for their developer program, you can go to this graph API explorer, which has two wonderful properties. One, you can actually play with the API there. Secondly, you can copy out that access token that I've blurred out and you can feed it into your own script. It only lasts for about half an hour, but that's all you need to do to download some stuff. The last one I use is Open Australia, where you can get Hansard, which is the transcript of everything that's said in Australian Parliament. Using the Facebook API from Python is also pretty easy. So tips for if you are doing that is, remove extraneous content. Things like retweets and stuff like that will confuse your model and will cause your word salad to be more confusing and less funny because you really want to ride that line between delightfully strange and what the hell did it just say? The other thing you can do, which is something I do with WinNation, but you don't have to do with other things, is keeping timestamps. It's useful because if you want to decay the limit or decay the relevance of various things so that your thing remains topical, you need that information there. And the last one is playing with your content mix. What I found when I started doing WinNation is that while it seemed that one nation and drill should be obvious, a match made in heaven, they didn't quite cross over enough and by adding a couple of other accounts that I'm going to keep secret, I managed to bridge that and get them to actually start moving across between one and the other a bit better. So the actual process of generating this stuff, you generally got two stages. You got modeling and you got generation. When you're modeling, you've got some kind of a process. You're feeding some stuff into it and you're getting a model out of it. And then when you're generating it, you're taking your model, you're running it through a bunch of randomness and getting tweets out. So in a Markov chain, the way Markov chains work, you've got some kind of state. A state is some number of words or tokens, or a token could be like a begin or an end of content kind of thing. You've got a lookup table for each possible state that gives you a bunch of weighted possibilities for the next word or token. And then you roll your dice and you select one of those, which then becomes part of your state and the oldest part of your state gets moved out. So your state size remains the same and you just repeat this until your thing is reached. So to give you a quick demonstration of this, here's a couple of begin tokens. These form our state. We then have a bunch of possible words that we could select from that. We pick one at random. That becomes part of our state now and the oldest token is removed, at which point we have more choices, which we can pick. That becomes part of our state. And then we've got another choice to make. We make our choice. And then at that point, we've only got one possible choice. We're going to end on that one and we have our result. And that last sort of single choice thing leads to one of the interesting properties of Markov chains, which is a lot of cases where you're going to have either a very highly probable or only choice to make at a given point. And to give you an idea of what that ends up looking like, this is one of Wind Nation's tweets. And so to give you an idea of how this one gets constructed, the first bit comes from our sadly, sadly ex-Senator Malcolm Roberts. And I should have picked dark color for that. But you can sort of see that it's taken this first, I met with the best of section from that tweet. And then the next one came from West Australian One Nation representative Colin Tinknell. And then the last bit came from Drill. So, yeah, talking again about this process, the modeling process with the Markov chain is effectively word frequency analysis. The generation stage is a random model walk. So to give this a bit of a demonstration, I decided to download, I got a entire works of Jane Austin from Project Gutenberg and a really badly OCRed version of Lord of the Rings from the Internet Archive. And so the tool of choice in this case is a Python library called Markovify. It's really useful. It's nicely structured. It's really easy to override certain parts of the way it works. And it's also got a couple of other interesting properties. One of which is that you can see in this code here, because this is effectively what you do to generate your Markov chain, you can see that I'm processing the two corpuses separately and then writing them out separately because Markovify will let you generate based on models that you combine. So you can see that I can combine these Markov models and generate stuff based on that. So that'll generate me five sentences of output from these corpuses, which will look a bit like that. The trouble with this is it's a bit more word salad than you'd like. It's not so much funny as it just is. You can try and fix this by increasing your state size. The only change in this code is that I've added the state size equals three. That means that instead of tracking two words, it'll track three. And the result's a little bit better, but the problem is as you're increasing the state size, you're also decreasing the possibility that you're going to switch from one to another, which is really where the humor tends to come from. So one of the other tricks you can pull off is part of speech tagging. And this is actually an example that they give in the Markovify examples. In this case, we're going to override parts of the Markov model itself so that we're splitting up the words and then tagging them based on their part of speech. NLTK can do this. There's another library called Spacey that Markovify says is faster, but it really wasn't in my testing. But either way, it works pretty well. You can then just use that instead of the Markovify.txt model. So I've replaced that with pacified text. And then generation is largely the same. And you start to get, this is with state size two. And then these ones were state size three. And top one there was my favorite one that it came up with. So that tends to get you up to a point where, you know, it tends to work. And that gets you to this notion of quality control. The nice thing about the templated bots is you can generally just run them on a timer and I'll throw something out that's, you know, at best hilarious and at worst kind of meh. With the Markov stuff, you've got to range from hilarious to what. And so what ended up happening with Wind Nation is that it became a kind of semi manual process. I would sit there once in a while and I got bored and generate a whole bunch of things, put them in a queue and then it would, the posting process would pick one at random and post it up. That can get tedious. And the whole point of this is to be fun. I have yet to come up with a way to really get it to a point where I would just happily stick the output up. But there are a bunch of Markov bots that will do that. So I figured, you know, the next thing to do is obviously to try something a bit more complicated. So machine learning. Machine learning is obviously the next thing to try. So, you know, let's look at our modeling slide again. In this case, our model in this case is training in neural network. And our generation is something, something random numbers, but these random numbers get really big and, you know, so the next question is obviously what is a neural network. I'm glad you asked, because I have no idea. Well, I do sort of, a neural network has a neuron which has an activation function, has an input and an output, and then you train it and you back propagate and all kinds of buzzwords. I did a bunch of really heavy Google, I mean, research and discovered that the one that you really should use for language modeling is this long, short-term memory one. And I believe everything that Google tells me. So having believed everything that Google told me, I asked it how I should actually go about doing that. And took the most carefully researched answer there, as you can tell, because it's followed. So PyTorch is a machine learning framework for Python. It's got CUDA support, it's got everything you could need to do the kind of thing. It's even got a word language model example. And it had instructions, which I followed. And I ran it and it did a bunch of things. So terms I do understand. Milliseconds per batch. Loss is a measure of how bad it is. And perplexity is a measure of how confusing it is. So yeah, PPL is perplexity. And that's sort of how often can we generate something that actually represents what's in the validation set based on what's in the training set. And then once that finished, I could generate some stuff out of it and the results were a little bit underwhelming. Especially given that, so this is the timings that I got. This is running on my laptop, which is a 2013 MacBook Pro. So this is running my Markov stuff. That took a while. But of course these days we have other ways to do things. And that P3 2x large, that's a current, the most latest generation GPU compute-oriented instance from Amazon, it'll set you back about $3 an hour. A bit over 10 minutes is not too bad for that, but I did do a bunch of playing around trying to get something that actually worked. So I tried, well, part of speech tagging worked for Markov, it must work for machine learning, right? Not really. And at this point, and at this point I rapidly concluded that I just, because really what it comes down to is just the entire point of this is to have fun. There are great toy bots are a wonderful toy for playing with this kind of stuff, you can have a lot of fun, you can make fun of people who deserve it. But, you know, it comes a point where it starts to feel like work and when it starts to feel like work I lose interest for a personal time project. So in conclusion, have fun with Twitter bots, they're great. If you've got an easy sort of grammar that you can write out in tracery, use CheapBots done quick. If you want something more complicated, use Markovify. If you want to use machine learning, go for it, because I reckon you probably could come up with something that is just beyond my understanding of the field at the moment. And so with that, I thank you very much. Thanks, Beno. Thank you. Thank you. Thank you. Speaking gift. Thank you very much.