 So I'm Anusha and I'm actually a machine learning enthusiast. I'm constantly learning myself every day and today I'm here to share about one of my hobby projects, which was text summarization Basically What is text summarization? It is passing in Some text could be an article could be some text from social media anything and Outputting some sequence of text again and all around us we Overwhelmed with so much information these days that all of us could definitely do with some summarization to understand the gist of a big source of text so basically that's why I got interested in this area and it Okay, we're back on so I'll go in a little bit about the different kinds of text summarization that we have so the two main kinds Extractive text summarization and abstractive text summarization Today, we're focusing on abstractive, but just to understand the difference if you see over here Extractive is a bit lengthier than abstractive text summarization What extractive text summarization does is it extracts words and phrases from some input data to form a summary and Abstractive text summarization is more human like in nature The aim of the model is to learn an internal language representation and paraphrase the intent of the text Rather than extracting the words from the text So What we can do to get started with abstractive text summarization. This is something that Dr. Martin Andrews covered earlier, but for Any kind of project like this you would need to do some data processing and that would involve Using any kind of data set that you want over here. I used a data set of short articles and I used a pretty small data set Because I don't have a GPU and I needed everything to be as efficient as possible. So first thing I did here was to filter out unnecessary characters special characters that are not needed and Sentences for example, which are too long or too short. I have removed those and Next would be to tokenize the articles into words So we don't care about the sequence the order of the words We just have a bunch of words and all of those words are the input for our model and from then we'd create word embeddings so that these words are represented in a numeric manner and like we've mentioned already to waste to do that would be word to VEC and Glove over here. I've used tensorflow and tensorflow actually does the word embeddings internally so I've not actually Had the need to work out What to VEC or Glove in this case? So these are some examples For example, a human would know that there is some relevance between Spain and Madrid But a computer would never know that Spain and Madrid are Related in any way. So that's where these word embeddings come in and this is what tensorflow will be doing for us Now we'll move on to the architecture that we're going to be using In this in text summarization we'd use a sequence-to-sequence model and a sequence-to-sequence model is basically made up of a recurrent neural network so This is again something that we've covered, but this is how a neural network looks like the hidden layer in the neural network uses both inputs and its own outputs to process the next state to come up with the next state and And Yeah, so this is how the basic RNN would look like and in our RNN. We're basically using Stacked LSTM cells. This is how they look like we have input gates output gates and forget gates to control the state of the LSTM Now we look at the sequence-to-sequence model. So this is basically What the sequence-to-sequence model would do is take in a sequence and output a sequence, which is why it's called sequence-to-sequence and To do that we use two recurrent neural networks The encoder over here that you see that is one RNN and the decoder is another RNN What the encoder does is it takes in an input sequence and comes up with an encoded intermediate sequence then the decoder uses that intermediate sequence and decodes that to form words again and We'll look at the code later on but basically that's the idea behind a sequence-to-sequence model Opening up the code. We'll start off with data processing So over here as you can see, I'm just gonna go through this code real quick We're going to be loading whatever data that you want that you have and tokenizing that data to form the words Here we're doing filtering to make sure we've got the exact kind of data that we want Sentences that are not too long or not too short, for example And then we're going to index the data. We're going to create a dictionary index to word and word-to-index And we're going to come up with a set of vocabulary And this is one extra step that we would do is that is to pad the sequence Sequences that we have so as you know The input sequences that you will have won't all be the same exact exact same size and for them to For it to work in a sequence-to-sequence model. You would need to pad the sequence with zeros So that's what we've done over here pad the input sequence and the labels with zeros and Then we save the final version of data that we have now tensorflow has a sequence-to-sequence module and We can for any of these kind of projects you can use something like that So over here I've built a wrapper over the sequence-to-sequence functions that we have We are basically building a graph, which is your model It has the LSTM cells and it's using the dropout method Which is a way to it's a regular regularization method to prevent overfitting of the data and We're going to be training in batches So that's all of this over here if you can see we're going to train All the training data in batches and here's a function to predict Your Output sequence given some new sequence that the model has not seen And let's go back to the slides Okay back to the slides So now that we've seen how the code looks like we can take a look at How we how this model can be trained if you have a GPU this would be very easy But I didn't have a GPU so I trained it on an Amazon P2 instance and some of the results that I Got are down here. So basically This leaders begin arriving for a common world summit that was the headline of a short article that I had that I tested with and this was predicted quite well the the headline was it from the model was able to form the Headline quite well, but if you look at the other two sequences that we had Croatians vote in three constituencies if the model wasn't able to Form much it came up with just constituencies and constituencies. So as we can see There are variable results for certain kinds of Articles the model was able to predict pretty well the the summaries but for certain kinds of articles it wasn't able to come up with the Headlines properly and of course all these all this boils down to the kinds of parameters you set and also a lot of other factors I've only I've only I've Constructed a very basic model here, which is probably why you're seeing these kind of results and Definitely there are a lot of room. There is a lot of room for improvement. For example If you're going to be trying this yourself, you could also try doing weight decay You could you could try use you could try doing word embeddings on your own instead of relying on what TensorFlow is doing for you and Apart from all of this if you're going to try this yourself you don't have to stick to just articles and headlines you could also try with language translation could be English to French French to English and You could also use social media posts and responses and Recipes anything so it's It's a the sequence-to-sequence model need not be used only for articles it could be used for a lot of different kinds of Data With that I would like to wrap up Just try a few different times what was the hardest part I Would have to say it was the The sequence-to-sequence bit It was It wasn't as in the math behind it isn't all that straightforward because Underneath it there recurrent neural networks and LSTM so I Wasn't very satisfied with just using the modules Straight out of TensorFlow so I wanted to go a little bit deeper and understand the math behind it Which is why I found it a little bit difficult to grasp it Fully that part took me quite a bit of time Thank you Sure, so the sequence that went in was the article description itself Can I look at the github I've got the text file here and you'd be able to understand the input and the output a little bit better Yeah, so the input was a short article something like this. This is an adjacent format and The output was The headline of that article Which was something like this So this was the data that was used for training the article Description the text and the headline of that article and in the prediction the description was used and The output sequence was supposed to be the headline of that text Yeah, for me it was taking a really long time to train when When I kept all the long Sentences plus the results weren't as good because there were some Sentences that were really short and some that were really long and I think that discrepancy cost that difference cost the results to be bad and so I I Decided to so basically this was after some research But the the way that I could make it better was to shorten those sentences make the differences between the length the length of The sentences a little bit smaller. Yes No, actually, I have not tried that but definitely would be something interesting to play with Anything else any other questions? It's not