 Thank you all for coming. I'm creating search engine technology since 2004. And I lived in all this time when all these old pages had been there. So I thought it's a good idea to show you what have been there, but might be in the future. I get the microphone. So that's that. If I talk about search engines, then it would be good to know what was there to put into a search engine. And in the beginning of the web, there had been a lot of private pages and the first hosting service, which came up, which went really big, was GeoCities. It was full of strange things, blanking icons, images of weddings, and cats, and so on. So this was pretty big. And it became a good reason to have a search engine to find all that stuff that was out there. And one of the first things to where someone tried to organize information was the page called Jerry and Dave's WWW interface. It was just a published bookmark list. And this became Yahoo. It's just a change in the headline. And it was an ontology of topics. And it went into the Yahoo web page, which had also a search bar, but the original idea was to create something which is like a telephone book, like a book where you can browse in and see you have categories. And that was the way people organized their information. So it was the first step to bring a link to everything that was so similar to things people are used to, like yellow pages. And the next step was Alta Vista. Alta Vista concentrated only on full text indexing of web pages. And Alta Vista became a kind of standard as a search engine for many years. Alta Vista was like Google, if you want to Google things, you always went to Alta Vista and you search it. Alta Vista was completely natural. That was the thing which had to be used to find information. And it was running on small machines with 4 gigabyte disks and 160 megabytes memory. So this was already very large at this time, end of the 90s. But yeah, it worked. And this is not really, Zip2 is not really considered as a search engine themselves, but it's another good idea to show how search engines had been there to answer questions of the people. Zip2 was founded by Elon Musk, which then sold Zip2 to Yahoo, and later on created paper with this money and paper was sold. And Elon Musk then founded SpaceX and Tesla. So this web page may be the root for the cause that people will settle on Mars. And humanity becomes a multi-planetary civilization. So this is the beginning. And it was just yellow pages, yellow pages on the web. He had the idea to put all the information from yellow pages into a search engine, not to browse into yellow pages but search into yellow pages. Yeah, it was bought by Alta Vista, not yellow. And in the late 90s, a group became and what they changed was a way to do a different ranking. Alta Vista had a very traditional and scientific way to rank the pages or full text indexes by using a term frequency. Term frequency just counts the number of such terms into the document which was found, and then orders by the number of hits in the page. And this became a strange playground of search engine optimizers where they put their keywords into the web pages many times. Web pages became very strange. They put in the words in white color, on white background at the end of the page. And it was completely messed up because of what search engine optimizers recommended. And Google cleaned this by using something which was commonly in the scientific world using a count of the references of scientific publications. It was called citation rank. But every page made a pattern called it page rank and used it, applied it on web pages, and counted the references web pages had together. And when Google started with this kind of search, they had 100 million documents into their search index. And the server was enclosed in legal pieces, legal bricks. It was a strange thing. But it was so successful because it cleaned up all the mess the search engine optimizers made before. And the search results have been so really good in comparing to Alta Vista that there was everybody, it's better to word that Google has such an interesting ranking. And it created the best search experience at this time. Around 2000, the dot com bubble burst, bursted. And so many strange services had been up like the million dollar homepage. Do you know the million dollar homepage? Someone put up one megapixel picture. That was everything. And it was empty. And he sold every pixel for $1. It became a real mess. It sounds like a strange idea, but at the end, it was really very fast that it was completely full. There was no point to go to this page. But everybody wanted to have a pixel on this page. So there had been so many strange ideas until 2000. And finally, people found out that it's all rubbish and it's all strange things. And they went to start with user-generated content. Before this time, people didn't generate their content themselves, like putting something in a wiki or putting a comment in a form because these systems hadn't been there. And that started also alternatives to Google. But nobody at this time was more successful. Yeah, the Wikipedia started 2001. And it's another example for the attempt to answer questions. Before the Wikipedia, we became answers to a shallow view to all the topics of scientific things and all the things in the world by dictionaries and by anticlubidias. And at this time, they all stopped publishing their books. And Wikipedia is now the main reference to knowledge in a shadow way to the web. So what I want to tell you is, searchers is always about answering questions. And then there are services in between which also do answer questions. So this is the idea to explain what the search engine in the future could be. And maybe you're here just to hear my idea about the future search engine could look like. Facebook 2004 is something which gives you the answer on social connections. And YouTube was really a strange thing. It just collected all the many funny movies which had been around. And in the last three years, it became a way of replacing television. There's so much entertainment and education and good stuff into YouTube. Twitter is another social community with a search window. The search window in Twitter works completely different than what you know from Google, because Google is supplying a kind of ranking, which is based on popularity. Search results on Twitter is based on a timeline. So it's another search portal connected to a specific service like a search portal of Wikipedia is connected to a special service. But they all will get together in some way, because they are all about answering questions where previously they didn't get one. And somewhere in the middle of the 2000 years, something strange came up which was called Semantic Web. Everybody believed this is the future. Semantic Web will be something which is more capable of understanding what is on the web page. It would require that you use an RDF schema, which is a syntax to describe things. Like if there is a name of a person on a web page, you enclose it with the syntax of the RDF schema and say, this is the name of a person. So every search engine could be able to see it's possible to collect context information to a web page and not only index the full text of it. But I believe if people would have enough discipline to do this, and a search engine would come up to use this kind of information, then at the next step, the search engine optimizers would come and would destroy it again. So this is one way to create search engines using meta data is to create search facets. Until now, Google didn't develop search facets in this way, as it could be using a rich information of meta data on web pages. It may be because they tried it and found out that it's easy to destroy. So Tim Berners-Lee came up with HTML and HTTP. And he created the idea of the Semantic Web, which is basically a good idea, but people didn't have enough discipline to make it correctly. And therefore, he formulated something which is called the Giant Global Graph, which is exactly the same as the Semantic Web, but also together with some rules how to create semantic annotation. And this caused that, for example, the Wikipedia started to collect information in the syntax of RDF schemas according to the Giant Global Graph. And the Wikipedia is a nice attempt, and it's really working. It's really nice to create meta data out of the Wikipedia. So you can take, you should try this link like this, and you get the Wikipedia page about an open data institute in RDF annotation as meta data. So this could be a really nice basis to create something like a knowledge engine which can answer questions based on knowledge in a formula way. And OK, then campaign, very late. Very late at 2009. Until now, they didn't catch up. They tried to be successful by making it a default search service in some components they sell as well. And the big difference between Bing and Google is mostly that they don't apply so much sensory. So if you're looking for a good porn search engine, you must go to Bing. Yeah. Creating search engines based on knowledge, not searching on full text, but on knowledge was one from Alpha. People mixed it up and say, no, this is not a good search engine. It doesn't find my pages of the topic of my interest. It gives me some strange facts, and that's not what I want to. But this is the wrong application of this. This is just another way to answer questions of people, but different kind of answers. So it's another search engine. It's about answering questions. Facebook created the global graph. It was active only for two years because people thought out that it's a really strange thing because you can answer. Tell me someone, a friend of a friend, which is living in my city and is female and is single. And you can try to make combinations of what you want to like to know about people. And for some strange reason, they stopped it, and the global graph search is not there anymore. But the technique is there's only the part that is switched off the parser, so you can create some, you hack together some URLs which create the same function. But there's not the full text parser of the human language to get this information. Semantic web goes on. Google discovered a schema scheme where they used the RDF schema of the giant global graph to annotate specific content on web pages which is related, for example, to shopping. So eBay is using this. We have a language like price currency, and they put it into their HTML. So the setup is there. Google is using schema.org to introduce something in the annotation of web pages which couldn't be hacked in a way a search engine optimizer would do, and would be useful to get some more information in search. But you can't see it by now. But maybe in the future, Google will use all the information which people put into the web pages if enough people adopted to this and price search engines, for example, use this information. So this is another step forward, but it's not so visible. But it's in the web page. You can ask what kind of free software is there. In the end of 90s, the machine started. And out of the machine, it was packaged into Zola in the last search. And you can put on some appliances and put it into appliances. And these are the two projects I made. So I recommended you test these. It's a web search engine. This is a Twitter search engine. And that's nice to see that you can run a search engine yourself. A hardware is sold by Google. There had been a Google search appliance. There was a commercial appliance from a university startup, fast ESP. But by now, this is bought by Microsoft and destroyed. And Google stops selling there or renting their Google search appliance and make a cloud service out of it. So you cannot put it into your company anymore. And this is the future. It's a view of the future from the past. And it's not like for Scottie was trying to get information from a computer. And he tried to speak with the computer, said, hello, computer. To the mouse, which was wrong, but however. So this is the idea, what will be the future of search being? Will it be that you talk to your electronic device, like to your mobile phone? There are two products which compete now, Siri and Cortana and Google Search in the phone. It's an attempt to bring this kind of future vision into your mobile devices. But if people try this out and show, for example, at YouTube videos, what they can do with this, they always ask silly questions. Do you love me? Like this. Only a silly fix. So people don't really know what to ask these devices. Is anybody of you using these kind of devices in a serious way? Really? You must tell me. The magic weather. Yeah, weather. So this is, search is always about answering questions. So every questions you have should be answered. And the future search engine will fill in the gap of the unanswered questions. That's my guess. I cannot say how it looks like, or which device it uses. But it will maybe answer the previously unanswered questions. These are the common questions, like whether stock exchange also on what should I buy from what should I buy. There are recommendation engines which have strange recommendations like welding goggles and yellow collar. Why should anybody buy this together? Because it's for costume. But I did call out about the gray bin. So maybe for mad scientists. So the most unanswered questions of humanity is nobody speaks about it. I'm going to question everybody, answer themself, and never speak about it. And you never get an answer. And you get an answer from Google. Who am I? I don't think search engines will answer on these questions. But you can try to find out what are most common questions by playing around with who, where, what, how, or maybe about places. And if you find that these questions have been unanswered, you have found a possible search engine of the future. So that's my guess. Search engines of the future will answer the unanswered questions. And you can create these. If you find something which has been answered, create a search engine. So this is my talk. And these are my software pieces which you can try to create your search engine yourself and play around with this. And because this is a ask me anything, I have given you some ideas what topics you could ask about. Thank you for being here. And yeah, thank you. It is officially over. Yeah, but we have some time. There's not another talk after this one. So if you want to ask questions, feel free. There's no time constraints. So any questions for Misha? Yes? Can you give a quick demo or a quick one on the AC? No, this will require more time. I can raise this topic. Yeah, this is for me to set it up. Oh, OK. I don't have internet connection. This doesn't work. Oh, OK. Someone may have a problem. OK. More questions? Yeah, I have a question. It has a build in Zora, yes. You can use it externally as well. So I don't know, maybe you can come to Zora and the internet is so real and planned. Don't make it a real search. I've made it a good plastic search and I think it's great, but it would be a lot of work. Yeah, it would be fun, but a lot of work. Yes? For social researchers, it's a big topic of how the results of the search engine can manipulate your media topics. Yeah, it's easy if you're the operator of the search engine to manipulate the answers by just changing the order. Yeah, changing the order or even giving or leaving out certain things. Yeah, this is censoring. And you must have a good reason to do this because it should be discovered. But if you are a monopoly, nobody else can find out. So that was always good to have the alternative to run a search engine to find out if there is a censoring or not. So right now, there is a monopoly. It's called Google, I mean Zora. So do you think this is a real threat that we face as a society? It depends on your point of view what you expect. Google creates a ranking based on popularity and it's somewhat good. Google takes out a lot of dirty content which is not compatible to the public opinion of what is good and what is bad. Maybe Google takes out a lot more than should be necessary. So I don't say that there's a lot of censoring, but if you would provide a search engine yourself, you would apply your own point of view, what you want to show and what you don't want to show. There should be a discussion about how to control the content and not obfuscate it. But for sure it's necessary to do a moderation because you can present things which are not lawful by any law of the world, not only specific, strange laws of single countries. But there's a common opinion what is really bad and that's really good to take out of the search engine. Yes? So what are my options for scaling up search engines? The options? Yeah. If you scale up your personal search engine and you're using Elasticsearch or SODA, you can use the SODA cloud, which is a scaling opportunity, which is the essence for some years, or Elasticsearch, which has a clustering function which is very easy to implement. And you must learn a bit how it works. And it's, from my point of view, it's easier with Elasticsearch. And then you can scale up horizontally by putting on more machines. More machines equals better scaling. Yes? Do you know how your thoughts on how the Google is letting me work in the back end? So how do you, just now, on the screen, take Singapore and it came up with this of different questions, or you came up with this of different questions? Yes? How does Google know, how does it predict this in the back end? It's based on the number of queries. So it just looks, what kind of queries have been there? And maybe it matches with the first word. And then it says the most common queries are like this. And it's just by the number of. Do you also use something else, like another component, which fits in like transmitting on graph here, the inference side? Probably, because you can mix this to a mixed strategy. At this point, it's possible to mis-stare something. Maybe there is a spelling technology, which also corrects your writings. And it can distinguish if you finished writing the first word, or if you're still writing the first word. So I believe there's a mixed strategy, but it's not public. You can just. But how do you think you can mix? I don't know. I think it's a mixed strategy of spell-checking and a number of queries to more questions. That's it for the questions. Thanks again, Misha.