 So, parallel dots is an artificial intelligence company which is making innovative products around artificial intelligence and right now what we are doing is that we are making deep learning as a service APIs and training on user data and basically giving these APIs as to the developers to power their applications. So we have made four different algorithms. These involve semantic similarity between two text content, a text classifier which is more or less unsupervised. That is you just give it a sentence and you give it a tag and can detect whether this belongs it or not. You don't need a tag data set basically. There's an entity extraction where you can put up a sentence and we can directly detect named entities within it and we have a sentiment analysis API where you can analyze the sentiment of a text. So what we did was for one of the clients which is a news company, we made a contextual recommendation engine for their website. So what happens is, so these guys already had recommendation engines but the point was that they often failed, they used TFID based tag search and it turns out that for most of the articles they either fail or the recommendations are not really good. So we decided to make a solution for that. So the aims of this new solution that we're building was that it should be more accurate than that general TFID based related post plugins that you see on the website. By the way, you can see more than one widgets which we make, there is a related post widget and we also make some timeline widget and things like that but the main aim was to keep the accuracy high but the cost should be low. The thing was that it's just a related post plugin so they won't want to spend a lot on it. So we started with our traditional topic modeling algorithms in our MEP and it turns out that it was very hard for us to scale them up. So we shifted to a deep learning based approach and here I'm trying to describe that is what all approach did we take to make this product. So let's start with looking at the overview of the product, I'll talk about different kind of technologies that we apply to basically make up a recommendation engine. So think of it like this way, so that we put a lot of text and at the most basic level what we are trying to do is doing modeling for each of the world. So now we look at comprehensive, look at the text and for each of the world we get a thought vector out and this thought vector is something, it's a dense low dimensional representation and basically it's like if you have a thought vector for queen, king and man and if you subtract the thought vector of man from king you'll get a thing really close to queen. So this is the most basic level, after we model all the words we need something to basically combine these word embeddings into sentences and phrases. So for that we use neural networks and a few heuristics. So that is the second layer which takes us from world-level modeling to a sentence of phrase or document-level modeling. After this we actually make a search tree which is a space partitioning based algorithm and on that we can basically query for related articles very fast. The last layer is basically a web service which we write to handle the insane amount of spike that we see in publishers traffic. So the point being that the client I'm talking about can have hundreds and millions of impressions per month and to scale up your infrastructure to that level we needed to make some special effort. So I'll now describe each of these steps one by one. So the basic thing is word embedding, as I talk about it's a thought vector for words. A very popular implementation is word to wek which many of people might have already have used. It works very well but the point is that Google patented it a few days back. Google has a very good record with respect to patent but since we were like kind of anxious so we decided implementing our own version of this algorithm. It's by some other professors which we actually implemented and have open source so you can actually try it on our GitHub. So this is the algorithm that we use to model world-level features. Then we go ahead and actually model document or phrases or sentences. I'll also talk about some other models that we use at our startup which I am not able to cover in this talk because that's not part of the recommendation. So what we start with is once we have a document we use a recursive neural network to basically pass the nodes and gather a document vector. So you can think of it as going from a thought vector of all the words to a thought vector of the document. This week often use recursive neural network when we are trying to think of semantic proximity as to what kind of thing is being talked about or there are some heuristics that we have found out and there is some research also that we incorporated if we want to just check with respect to entities and also there are two approaches that we use here to make a document vector. Apart from this there are two other algorithms that we have got. We have a convolutional neural network based sentiment analysis. It basically convols over n grams of world embeddings to get the sentiment out. There is also a recursive neural network based entity extraction that we offer as an API. So now that we have document vectors we need to actually search for them. The traditional approach would have been put up a Hadoop cluster like divide things into n parts and query each of them separately. Now as I told you that we weren't allowed to actually put a lot of resource into this. So what we decided was to make a search structure. So space partitioning tree is a special class of algorithms where you can just recursively partition through the hyperplane of the search terms and get to a specific area where all your relevant things are located. So we implemented a version of, so one-to-one tree that I'm talking about is one of the implementations of such space partition trees. We implemented this and now if we give it a new document it can just go ahead recursively partition and search for the nearest documents in near about order log n time. It's not exactly order log n and we're actually now, so we already have a parallel implementation where we go ahead and actually divide the documents into n buckets put them on different cores and make different VP trees to get trying to, we right now have that kind of a concurrency, but we're slowly moving towards a shared memory kind of parallelism where we'll make it to slowly move towards two log n. So now that we have got this infrastructure ready where we want to query and search things, the main point was handling traffic. So we started with a simple Python and Redis-based thing. We thought, okay, then the document will come, we'll create a set of recommendations and we'll cache it and just keep throwing the recommendations again and again. But what happens is that as soon as this publisher would put a tweet online about one of his articles, the traffic was just insane, okay, so Python server where we have put just was overwhelmed. So we need, but there was some specific characteristic of this traffic. If you look at this traffic, although there are like, there will be like 10,000 concurrent users at a time, they can be up to that much. But the number of unique articles there were, they were querying well in a few hundreds. So we wanted to utilize this property. So what we did was we made a PubSub kind of a web server where you can actually get all the request and then you can dedublicate these requests and reduce the load on the machine learning server. And then when you get the output, you can basically send across the output to all the subscribers. So we use Go for this. The channel mechanism for handling concurrency is like really easy and cool to use. So none of us was a system programmer, but we just went and wrote this like in a month. So that's just basically about all the algorithms that we have put in there. Now since I know that most of you would be wanting to hear more about the deep learning part that we have used. So I put up some slides on basics of deep learning. So as all the algorithms that you saw were basically neural networks which have got multiple layer of weights and each of these layers of weights was separated by a activation function. Now this activation function is what brings a nonlinearity into a neural network. So if you remove this, it's basically just a set of linear transforms. So yeah, so this is how a neural network is based. There are different architectures I'll describe in the next slide. Okay, and then there are ways to train it. You typically use back propagation. And most of the neural networks that you see are trained using gradient descent. So there are other approaches also, but all the algorithms that we have are trained using gradient descent. Okay, so these are some common architectures that we can see floating around. So convolutional neural net is basically something that models the human visual cortex. Human visual cortex is model using these convolutional matrices which will basically extract out more and more higher level abstractions as the layers go ahead. And the point of making such a neural network is to find out what is the optimum convolutional matrix for each of the layer. So there's a Boltzmann machine which is basically a stochastic neural network implementation of a probabilistic graphical model. And then there's recursive and recurrent. So they both model arbitrary length sequences. For recursive, it actually models it as a graph, as a tree, I'm sorry. And recurrent models it as a chain. So recursive basically just keeps on adding the same layer of weight and combine and see what is the most optimum combination as a tree. Recurrent, although it looks simpler, but has got they increase its accuracy by putting in some more innovative neurons like LSTM and also this algorithm has got its own memory which it can take in from one step and put on the other step. So that's what these two networks are kind of the state of the art in NLP now. Since we are working on recursive, so we have more implementations as a recursive neural network. Since this is a short talk, I cannot just go with each of these. But yeah, if you've seen neural network literature, there's a lot of buzzwords. So I've tried to gather things around that some things are actually neural net units, some of these words, some of them are optimization strategies. Some of them are architectures. So I can actually come back if you want any questions you can ask about any of them. Okay, then how do we implement these models? So for us, Theano, Theano is a Python library. You can write your neural network in almost NumPy-like syntax. It compiles it to a CUDA code and CUDA is basically C for GPUs. It can run on all the NVIDIA GPUs that you have. So that's where we implement all our algorithms. For light, trying out our algorithms, we use a library called Kayak. It's very NumPy friendly and it can run even on simple CPUs. So there are other libraries like PyLinux to Lasagna. If you can think of them, they're like what in PsyPy is to NumPy. These libraries are to Theano. They've got some algorithms implemented and things like that. There's some implementation in other programming languages, torches in Lua, cafes in CUDA, Convatar, and C++. Okay, so that's all you have. If you could go to our website, you can check out our demos. You can see how our algorithms work. We have put up a demo on, based on the general Indian context. You can see for general blogs and support people, you can actually see how this thing works out, okay? If you have any questions, I would be glad to take them. Hello, you said about moving from the word vectors to the phrase vector level right here using an R and N. Can you tell how is the process in this state? Okay, yeah. So what happens is a recursive neural network will try to pass any arbitrary sequence into a tree and it tries to, so it will see a lot of. Is it a fixer, you assume a fixer length for every vector, sorry? I'm sorry, could you? You assume all the documents to be a fixer length or you allow for variable length? Yeah, so all the documents are a fixed size length. So it's exactly the same dimension as that of a word vector, but then, so you can even compare words and documents with each other. Everything is, for us, everything is a 100 dimensional vector. We eat a document, we eat a word, anything. I'm telling like, how are you reaching this phrase? Yeah, yeah, yeah, so I'll go ahead with that. So basically what it does is you give it a sentence. So for each sentence, there will be different words for each of them. There will be a word embedding. Now when you give it this array, it will try to basically make a graph, a tree out of it. And how does it? It's a simple addition of the word vectors. No, it tries to make a tree out of it. It's not like, so in, I mean, in recursive, anything that you give it to it, it will try to make an optimized tree out of it. You give it a 100 sentence vector. It will see what is the most optimum rules or the weights of the neural network that help to make that tree. And that weights will be your output of the neural network. So basically you are saying give these 100 documents. I've got these kind of word vectors. How do you think these things combine into a single document vector? So what I'm asking is, I have this 100 dimensional word vector embedding for every word. So for this, I want to come up with this using a recurrent neural network. So how would you train this to come up with this document vector? So what I'm talking about is not a recurrent. It's a recursive neural network that we use here. So recurrent neural network is basically, you put in LSTMs and stuff like that in middle and it passes in form of a chain. We use recursive neural networks. They actually make a tree out of the sentence automatically. So the point is, you just get these rules in the language that how do you see these words being forming a sentence and all. And once you have it, I like to call them grammar rules when I say so. But since there are more scientific people here, I won't say that. But it's set of rules to basically combine words into documents. That's what we get as a weight in the neural network, the output weight. Does that answer your question? We'll take it offline. Okay.