 Hi everyone. My name is Son. I'm the cloud technical evangelist for IBM. Today in my session I'd like to share about the AMG Solar. The moderator mentioned Solar has a lot of features and I don't think like 15-20 minutes I can cover everything. In my past experience I used to work with Luzin, which is an open source search engine. And Luzin Solar is the next generation of search engines built on top of the core Luzin. Many of you have heard about Luzin or AMG Solar. So dear gender, I will talk about AMG Solar and its key features. Next we will talk about indexing and data model in AMG Solar. I work with a lot of Solar web admins using the face and wrap up with how AMG Solar is used in IBM Watson, the drip and rank. Nowadays we talk about the big data landscape. In 2010, the world generated like one zigabyte of data in one year. And each year the speed of data being generated increased extremely fast. In 2014, there were 7 zigabytes of data generated. With the booming of different technologies like big data, social, mobile, and other things, the volume of data became very huge. So in the modern web application architecture, the water challenge is faced. The subway contact has to face its data number being increased very huge. And so the Zigabyte consume is generated by a global user base. And there are more and more devices connected to the backend. So collectively there is a technology called NOSICO to solve this. So Solar is a NOSICO technology. In nutshell, it is a Java web application. This is the main component of Solar. At the bottom, you see the storage containing the index. At the highest, it can be a J3, it can be a JST or JBOSS. So at the core, you see that this is the IP address you see. It handles the logic of indexing the document, analyzing the text, and building the index to be used for the searching. At the high level, there are many different. And the beauty of Solar comparing to Lucin is that it's exposed the facilities via REST, REST like ABI. In fact, it's not really completely strictly adhered to the REST phone principle. Because like in REST phone, you use the REST to be deleted to delete the document. But in Solar, if you actively post to delete the document. And it supports the REST client from most of the popular languages like Java, Ruby, Python. Now, let's go a bit deeper into the architecture of Solar. In games, in Solar, you see the concept of Solar is around the core. So core is collectively the index and configuration file. So Solar supports, the core can be like, you can have multiple cores in one server. So it supports very well for scalability. See the other benefit of Solar is scalable. Because in Solar, it distributes index in multiple cores. So you can run multiple, you can go run query. Solar is ready to deploy. It's an open source. You can download. And within five minutes, you can have such engine running, open running. And Solar is extremely fast. So the speed of querying like in a million seconds. Solar can handle like millions of documents easily. And it's designed around text-centric search engine. So it supports different documents like in emails, in web page, or retake format. Or even twist or blocks. The result with which are sorted by relevancy. Solar is powered behind most of the popular websites in the world like JAPOS or Best Buy. Any questions? I was just wondering, it talks about it handles complex queries. So what are the capabilities that the search actually supports? So what complex queries can it handle? So because Solar uses Lucin as a core. So you can query fuzzy queries. You can put the white card characters into the search input. So you don't have to do an exact match. You can do fuzzy search also. In fact, Solar's architecture is very extensible. The designer of Solar know that Solar is best designed to comprehend, comprehend any existing architecture. So let's say, if you want to extend the text analysis, you can provide your own custom analysis logic. So this overview of the indexing process, the inputting document, it will be given into the process to update us, to analyze the text. Based on the chain of tokenizer. And then it will be stored into the Lucin index. So the concept of the document is the atomic unit of convention in Solar. So the thinner of fields and values belong to a given entity of your domain model, with a car or book or person. So here like a book, but you have two different documents. Because let's say the first document it has, it doesn't have a publisher. But second one can have the publisher fields. So very dynamic. And the first standard number of fields across all documents in one index. Because in a search engine, the speed of retrieval is very important. How it can return the result in a number of milliseconds. Because before the query being done, the index has already been built. Let's say we have three documents like this. So the analysis process will analyze what are the keywords. Let's say in this sentences, the birthday, the concert, the release and so forth is the keyword. You call it terms. So if keyword, there will be a map of that keyword appear in documents. And how many times. So when you say you search for the keyword like ignore it. It will do the mapping. It will fire up the appear in document two and three. So the top level entities declare in solar schemers. The field type declare using the field type elements. Earlier I mentioned about analyzer. So in analyzer you can change. In config file you can change many different analyzer class. Let's say we have the lower case. In a query, you want to lower down lower case on the keywords. Or you want to remove the stop words like the common English keywords. Stop words. So the stop words will be defined in a template text file called stop words.txt. Similarly for the synonym, you can define a separate file like synonym.txt. So what the tokenizer do is break a sentence into a stream of tokens. For example, let's say the sentence I'm writing a symbol text as the input. The white space tokenizer it will break it down by the white space. If we use the keyword tokenizer, it don't split anything. So the whole sentence we treat it like a keyword. There are many more tokenizer in solar library. You can check the online document and you can find them. So now I will demo the solar web admin interface. So you go for our solar. This is the website. You can download the solar. You can download, let this release. So after you download it to your computer. So we use the command pin solve. In the folder pin.solve, you start the solar server. So by default solar will start two instance on the same, concurrently run on the same machine. So the default for solar is 983. So after you start the server, you can run the command pin select post and you create the collection. So the collection, you can give any names. For me, I use the name getting started. And I will pass in the documents. I want to do the index. So here I have the same example folder. It sit together with the solar library. So here I have different documents in different format like CSV or JSON. So the first time I will need to do the indexing. So the speed of indexing is pretty fast. Done. So now you can do the query. So on the left hand side, you can select the collection. So here there's two collections. Here you see that it creates two concurrent instances running on the same server. So it can support for tolerance. In case one incident is down. So before you do the query, you need to select what collection you want to query. So in this interface, Q parameter is the query you want to do the searching. And WT is the format of the output you want to export. So here on the right hand you see these are the results. These are the results for the name of the book, it's Game of Thrones. So it appears. In fact, there's a CSV file containing this document, Game of Thrones. So the CSV file, what these are? And these are the columns to have the ID, the category, the name, the author, so so forth. And so it can generate exactly the same map according to the CSV file. So IBM Watson is a sort of application of a solar. So basically it's a leverage on solar to generate information and get the most relevant results. You can see the demo. Let's say I give the input. I can get any random questions about aerodynamics. And the engine will run. And this is the output about this input. So the output will be sorted by the relevancy. So this sentence, what are the aerodynamics in terms of interference effects on the lift and the body? If we use the machine learning approach, we have these answers. If we use the standard search, we have different answers. So with this, I would like to wrap up my presentation about the solar. Thank you. Any questions about solar? So when you maintain the indexes, it's going to consume a lot of spaces in your system. So how do you manage those spaces? How do you manage those strategies for archiving? Because one of the key things if you have more indexes and more data, then the search becomes slower for the period. Sure. Good questions. I think you need to determine that in the document, what are the few you want to return in the search results? So in the schema file, there's a variable called indexed or store. So the store field, if you're able to throw, the efficient will be stored in the index. So what other information in the document you want to return, then you store, otherwise you don't store it. Then by doing that, you can maintain the size of the index, not so exponentially growth. Is the way how we have to manage it? Yes. In terms of search capabilities, does it help people with these extra analysis or semantic analysis? So what I know of it in solar, it supports the facet search. Meaning from the pull-up result, you can narrow down the set of answers. Maybe we can take it offline with you. Set of data set. So when you try to search, it will have higher points or data set may have more. Can you hear the rules of how it works for this part of the production? For my experience with Lucene, in Lucene, there are certain rules like the frequencies, the frequency of the keywords. In my earlier slide, in this document, this keyword, it appears like five times. In our document, that term appears like three times. So in the first document, we have a higher relevancy because of the frequency of the keyword, it appears. Can you configure it? You can. You can configure it in the Lucene config form. Any other questions? If not related to the session, thank you so much, Todd, for the information.