 So my name is Evgeny, I'm a data scientist and more like a software engineer, a data scientist kind of guy. And my talk today will be about history of Google and the very beginning of Google as a search engine and the history of the search engines in general, but also about intellectual inquiry which was behind the search engines as an idea. And so, but before I will go for this, I want to ask a question, how many software engineers are here today? Could you please? Okay, so I'm not sure, like, okay, okay, let's see, let's see how it will work. So and also if you think what I'm trying to follow Google in giving me a job, you're absolutely right. So, to Google it's enough, added to Oxford English Dictionary June 15, 2006, means to search in internet for information with the Google search engine, but also it's used as just a cinema, searching in the internet. So, but do you know any of these brands, for example? Like, yeah, yeah probably, which one? Do you know which one? So all of this is there are search engines and there are search engines which have been there before Google. So Google wasn't the very first one. They only, Yahoo still exists and Amazon got rebranded as Bing, maybe, maybe you know it. So how does it happen? Like, how Google gets so much advantage? And to answer the questions first, we need to look a little bit in perspective. So the state of the search in 1998 and to give you some feeling about how it was in time, I will give you two quotes from the paper written by the founders of Google, in which they describe the current state of art in this year. So the first quote, people are still only beginning to look at the first few tenths of results. So it was implied that you're looking for a few tenths of results before and might have a run. Only one of the top four commercial search engines finds itself. It means you type in Google and Google and you won't get results. So you can get some idea of the quality of this product in a moment. So, but before I will go to all the complications and the problems, which were the reason why it was the quality of search in a moment, I would like to answer this question, which I got asked when I was preparing this presentation. If it proved strange for you, I'm sorry. And I was really fascinated how to answer this actually. So let's ask another question. How much time would it take for one request if you will go over all the sites in the internet in the current state of art? So it's approximately 4.6 billion sites in the internet nowadays, taking 10 milliseconds for one site, which is very, very optimistic. It will give us 46 million seconds, which is equal to 532 days for one request. And the Google is serving 30,000 requests per second. So now we can kind of understand what if we want to have something which called search engine, we need to pre-process the data so we cannot just go around and look in the internet every time someone is asking. And so if you look abstractly in this kind of problem, so you can think about internet as a collection of pages. And on these pages, you have a text. And so in a very primitive example, you just want to retrieve all the pages which contains the text which you put in the query, right? And all of you, doesn't matter if you're a software engineer or not, know this concept. It's called index. So this is a book index, which is just a mapping of the work to the page on which you can find this work. And so it works fine. Okay, so by following this principle, let's see how we will do it in the internet. So we just take one page, put a number, just put an index one on this, and then we go sequentially through all the words, all the terms on the page and store it in a form of mapping between term and the page ID. And so we go over every single word. I'm not doing it now, but you're getting the intuition. So we go into the next one, and we extract the terms, and then we're getting the mapping of the term to the pages, number of pages. So we can retrieve all these pages, but then they're not a problem or else. So you can see it's 56 million pages for our search query. So it doesn't give us anything. We have a look on this number, so it's a little bit less, because Google has indexed. But if you look on this problem, we understand it's now a problem which is called problem of ranking. How to find the most relevant result out of the pages retrieved for the query. And now I want to ask you a question. So for the Query Olympic Games, which sites you will consider relevant? Which one you will expect to see on top? Yeah, yeah, that's one idea. I'm more like looking more generally. Wikipedia page, there is official site of Olympic movement, something like this. And it's very limited number, but it's not 52 million pages. So how can approach this problem, and how it was approached in the very beginning when the search ended up here? So first, very simple idea. We can just count number of occurrences of the term on a page. In assumption, if the page is talking a lot about the term, it means what it's kind of more about it. But you can immediately see a flow in this logic. Because the owner of the site had a total control about what's posted there so someone can post Olympic Games one million times, and there is not a problem. You make it like invisible font and user will never see it. And so from point of view of such engine, yeah, it's one of the ideas which were implemented in the beginning. Is it in the title of the page? Or was it used? How specific the words of the query are? It's a little bit more tricky, and I'm not going into details, but you can kind of consider if this work is very quickly occur on other sites and based on this make a conclusion how good this work for other search. But if you look on these methods which were used in the very beginning you understand one flow and it's all about what stage, what site is telling about itself. So there is no vision of a crowd here in any form. So, and it's how it was approached, but we can just try to remember what internet is a network and it's a network on different levels. So one level is a physical level, internet is a network on the physical level, but our level is a level of link. So all the sites in the internet are interconnected by link which brings us to concept of hyperlink. This is very easy. It's just the text which is marked in a certain way and by clicking on this second page you land on another page. So, and then in our question count here, so this text is on the site of the Olympic Games. Do you think it should be indexed as a part of the Olympic Games? Like logically thinking, because this information corresponds to the landing site. And that is what's there because the idea which I found was of Google implemented, we just indexed the text on hyperlinks as a part of the landing page and it gave already huge improvement just because this is information which other people telling about the landing site. So, but then it can make a little bit of a step forward. So let's consider all the hyperlinks and let's make what's called graph. So it's called direct graph. Then the dot is its site in the internet and these arrows are hyperlinked in the site. So by looking at this we already can make some assumptions. For example, we can say what some of the sites have more into the connections. So which we can use some kind of heuristic but this site probably is more important just because our people kind of making the links on this site. So, but let's look on this problem from a little bit different perspective. So when we're talking about relevance of the site, we can say what it's their site which are more likely to be visited by the people. It's a site which and the sites which have more interconnections also more likely to be visited to the people. So, yeah, it gives us some information on how we approach it, how we rank the sites. And let's, okay, let's take a very simple approach. So imagine we are starting from a random place as a user and then randomly switch to the site, a random site on which there is an interconnection in the next one. So like this. And every time we're at the site we just count it. We just put a number of visits. So as you can see, there is a site already on which it's more likely to be landed for the user. And it's totally intuitive what sites with more interconnection have a more score. So let's do it again. And we're just running this approach. We can kind of statistically estimate how likely for the user will be to land on this site. So the whole approach is called PageRank algorithm which was the algorithm which Google from the very beginning used and by which Google got the huge advancement of the competitors in the very beginning. And so as you may notice, I didn't mention any math. So I didn't put any formula or any kind of concept no one can understand. And it mostly was about intuition. So this finding is more about imagination than about knowledge. And this finding led to the great success of this company. So this is the point of my presentation. Take a different angle on the known problem and you can succeed. And it's apparently not the difficult for some people. Thank you.