 I was introduced to LinQ in 2008, by then it just came out and I was actually initially suspicious about why LinQ C sharp is a statically type language and why all this happening. But later I realized that it is really useful and today I will show you some examples about how LinQ we are using, I mean I am using LinQ in several scenarios. Before that little bit about myself, so I have written three books, first one was on data structure and then on dotnet generics, basically I love data collections and manipulations and my latest book that is coming up is on LinQ and it is available in next month and you can actually go and get a discount coupon, buy it if you will. But that I will leave up to you to decide whether you will buy it or not after the session. So this talk is about, I will show you some examples that I have used in the book and I hope you like it and after that we will break for Q&A. But how many people here use a C sharp and LinQ, only a few, but you have a general notion of composition everybody, right? So yesterday Venkat and I think in court Jughalbandi they showed some kind of composition, but that was on the surface, but we will show some examples which are really deep. So this example is basically we all have been programming for quite a while and we have these loops, traditional loops that we have and we can actually, we do so many things inside a loop. We have some initialization statements, some variable declaration. I want to show you a video that is basically a general strategy to convert a loop to a LinQ statement. Let us say you have a list in a loop implementation like this, you have some numbers and then you loop through that number and if those numbers are greater than a threshold then you put them in some other data structure, in this case a list. So as a first step is to identify the range across which you have to run the operate on which range that LinQ query will run. In this case our range is the entire number length range. So what we do is we will, so that is the range and then what is the condition, what is the condition that is if this number is greater than threshold that is the condition. So this part is the condition part and this is the range part and this is the action that we are taking when this condition is becoming true. So if I write this for loop in a for each loop then it will become for each element in numbers. First step to convert your for loops to a LinQ statement is to understand that every for loop can essentially be transformed into a for each loop. Then once you do that then your query because LinQ actually work on strongly type collections. So unless you have a strongly type collections you cannot you know use. So if numbers element greater than threshold then good numbers dot add element. So as you can see we have replaced this looping operate in looping variable key and now we have much cleaner syntax and we are operating in an object on each element level. So if I this is the first step if you think as a for starters this is the first step and then you can next step is just to replace. So this is our this part will be replaced with where clause and this is the range state and this is the action. So what I can do is so if instead we can just reverse this and say good numbers is equal to numbers dot where n greater than first. So you can make the connection just name it same element greater than threshold. So if you can see this condition and this condition are the same thing only thing is it just wrapped it in a for clause. So by virtue of using LinQ we have actually flattened the loop but do not get mistaken link is not magic so the complexity remains the same. So even though you are flattening it is not so in order to find the good numbers you have to still traverse the inter so big O notations and all that complexity stop stays the same it is just that we are being able to tell what we want get done in a very concise manner. So basically this part here and this part here and this part here is identical. So what I will do is I will comment out this section for now. If I run it again coming to this section this line this length line so we are using the for each link to see that we get the same thing 11, 15, 18, 17. Now I will comment out this for each part everything is commented except I will just come into the link part and then see. So we get the same thing back. Another important thing to notice is that at the end we had two lists but if you drop that two list that link statement does not get executed it creates a query which you can get the results. So it is basically being lazy as long as there is no operator such that the evaluation is absolutely necessary in this case a two list. So when you get write a two list it actually had to create a list and put all the elements evaluate it and put it in that list and give it back to you. But if you drop the two list until the point where you actually need it it will be great. I hope you understood this pattern any question on this? I will get back to showing how we can remove a nested loop. So for this demo I am using a tool called link pad which is really cool utility. So let us say I have a for loop let us say I have a dictionary of string and a list of string and I am keeping country capital map CC map for example I will say country tourist spot let us say and we will add a couple of entries in there and let us say I want to print all the tourist spots. So what I have to do is I have to look through this dictionary and then I have to create a collection on which I want to hold on to this thing and then I say psps.addrange and then I will dump it. So dump is a method in link pad extension method so you can actually dump any result. So this shows like that but instead of all this I could have just wrote I will comment it out. So this basically is a nested loop for each of this element we are going somewhere here we had only two we can go beyond that and so I will comment that and what we can say ccmap.selectmany and I am saying m.value and that is what I want to give to. So this guy as you see add range takes an i enumerable of string and selectmany also returns an i enumerable of result in this case value is also result of string. So I will I do not need to actually convert it to a list. So the result is the same so that is how you flatten a nested loop let us go back to demo and talk yes it does iterates through all the nodes that you have it iterates through all the possibilities all the loop that you are doing manually to the first step but I will show some examples then it will be more useful later. So here this examples are of from the book it is human computer interactions and we all know about T9 before T9 I think the latest one is swipe people who have android phone probably use this. This is Michael Orwell for value but I could manually select something else since it is an a I am just going to go to the next word test message and see that the a is there and test is there as well. Now if you notice what I just did on the s I so although it seems really cool but the idea behind this is not new it is very old and this is working on a algorithm called the longest subsequence. So we will get to that so before that so here are some amusing examples of subsequence. So as anybody knows what is subsequence is a subsequence is a string of another string where the characters of the first string occurs in monotonically increasing indices that might sound too difficult to understand. So I have given example let us say ornamental is a word so all these words rental, mental, oral, metal these things are if you see the characters of the word rental R E N T A L appear in monotonically increasing indices in the word ornamental. So rental is a subsequence of ornamental but the longest subsequence of ornamental is ornament because that is the longest here is a funny example world is not enough and wine is a subsequence world is not enough. So I will show you the code for achieving this so this is an example from the book. So this is a swipe simulation can all of you see this properly is it too big or does it fit properly everybody can see right this. So let us say I have a touch keyboard and I traverse through the characters these characters U J N B B N by way of moving my finger and so this is a simulation of that. Now what I am doing here is so I have a dictionary P 9 P X T is a dictionary that has lot of words in it and in this what I want to achieve is I want to provide suggestions that like the demo that other guy was showing here first step is I am creating a list of this words. So if I dump this query you will see that so query I query has some words in it. So at this point query has just the words now what I am doing is I have created a method called longest subsequence of please do not think about that right now. Let us say longest subsequence is a built in method that if you give two inputs to it and it will return you whether what is the longest subsequence. So for each of those words I look through and I filter them using a where clause and if the longest subsequence is the word itself then it is must be a potential candidate for suggestion. Let us say if I then if I go back here you see U is here N is here D is here E are under S is here T A N D S. So if I find the longest subsequence of the word subsequence understands is a longest subsequence that I can get back. So and that matches with the word understands in itself so that is a potential match for this. Now if I word I if I then after that I sort the match you know in the descending order of the length so the most probable word tops the list and then by I sort them by alphabetically and then take the first four elements and drop them as suggestions. So I let it fly and see what it produces. So here is the output it thinks that I wanted to mean understand I mean I do not know what is that mean but second one onwards but that is how it is. So do you like the example? Let us move on to the another one a spell checking this says one of the key areas that people have Peter Norvig he is I do not know how many people know Peter Norvig you know. So can you tell others who is here. So Peter Norvig was travelling on a plane and he only had some Shakespeare stories in his computer and he wrote a small spell checker using that content of that story as a dictionary words. So this is his code in python which is sheer genius and I try to clone it using C sharp in link you and just for the statistics his code is about 25 lines and mine is about 59 lines. So let us go there. Can you see this code properly? What about now? One second. I am sorry one second I probably have to go to his site. Here it is. What about now? Can you see this? Yes. So see this what he has done. He has created few methods called known edits and this is a very fancy example of list comprehension. So he has taken a range and broken the word into two pieces and that he is calling splits. The first part of that split takes for 0 to the ith character and second part is from the ith location till the end. And deletes like deletes is a suppose sometimes we misplace one of the characters that is deletes and transposes is we transpose one or later for the other maybe in apple we translate you know p and a actually should come in reverse order and but they come in reverse order that is a problem and replacements and inserts. So python is one of those languages which I can read but can write. But so I read his code and translated it. So I will show equivalent statements that I wrote in link. So here is my version. So what I did is I also created array and take the first element. I also created that length array but I have been unfortunately C sharp is not as expressive as python in this case. So I had to project that collection into splits which is first and last. First and second are the two parts of the splits and then I created all the transposes. And you are asking about select many. So here is an example. So let us see what select many does here is. So in this I have let us say I have a word called apple and if I use this paradigm then I will have A as the first element P P A E as the second one. Then another split will be A P and P A E like that and what select many is doing here is it is joining all this together to create a you know replacements for those characters. And at the end which we are doing is we are concatenating everything together so that we get a list of the possible mistakes. And here this is the training part we are loading all the words in the dictionary which is dictionary of how many times this occurs in the mistakes dictionary. And then we can so I mistook you know mistyped one word mystery and if I run this it will show mystery I will run it. So this is a really large code but I do not expect you to understand and all these things in this demo but I will upload it in the slide. You can take a look but any questions general questions you have so far. I hope you are familiar with all this link operators where select many select projections and all that. This is one of my favorite example this is Fibonacci series example. So how many times you have tried to calculate a Fibonacci number and then you run out of memory actually so the tail recursions are yesterday Venkat was talking about memoization and which is a really nice concept and I will show you. So memoization basically means caching calculate cache and for faster execution. So you have calculated if you calculated once in software engineering we have a principle called dry right do not repeat yourself. I think memoization actually fits into that block like if you have already calculated something do not do and here I will show you what can be done to lazily you will have Fibonacci series and such things. So if you will Fibonacci series is basically starts every series such series starts with a seed values and so what I am doing here I have created a sequence exam X static class and I have created two static methods and that works takes a seed values and I am converting them to an enumerable which might seem not useful at all but you will see in a minute that it is really useful. And then there is a so what is recurring relationship does is they have seed value and then they follow a rule to calculate the next value. For Fibonacci series we have the last two values if you sum them then we get the next value. So here what I have done is so basically you can encapsulate that in a funk. So if you take 2 t t if you say funk of int and in and the return is also an int but it does not have to be an int in case of Fibonacci it is an integer in case of some other series it can be a other data types. So what we do is we create take as long as it is true so this is also very interesting that as long as it is true we are looping through and then we are creating taking the first element last but one element and the last element and then concatenating the evaluated result back to the same result sequence and returning the last value. So this is now let us come to how we can use so Fibonacci rule let us say long, long, long and then x and y, x plus y this means take two element give me the sum of those two. Now how it will work here see if you want to find the first 5 Fibonacci numbers you say sequence x or start with long, 1, 1 these are my seed values. So I am giving it two seed values and then I am saying then follow a Fibonacci rule now what this does is it goes in there and finds 1, 1 and then it puts it in this sequence and if you notice that this sequence is the sequence that was passed to it. So actually we are not creating another structure and we are returning yield return so we can take as many elements as we want because nothing is that is the beauty of deferred execution. So you do not have to really wait till the end so instead of 5 I can now say 40 very confidently and it will work. So you see how fast it calculated first 40 Fibonacci series because age at each instance it had to calculate only one addition because everything else is pre-calculated and another thing that is very interesting is this looks like a plain English rules like you say start with 1, 1 then follow Fibonacci rule then take first 40 as soon as you say take 40 the execution starts. If you do not say take 40 this will create a query that can give you the results but not yet. So this is a example of a embedded DSL embedded domain specific language that you can do inside a host language in this case C sharp. Any question on this? No questions either I am boring people or they are not in the honor of my presence they are not leaving the room or something that is of the track. Now this is another favorite thing that is external domain specific language. So if you see all these things are doing the same thing. So all this code are doing the same thing they are trying to find Armstrong number everybody knows I am assuming that what is the Armstrong number. An Armstrong number is a number whose if you sum the cube of all the digits they sum up to the same number. For example 1 if you cube 1 it is 1 and sum of 1 is 1. So 1 is Armstrong number and 153 is also an Armstrong number. If you cube 1 1 5 cube plus 3 cube sums up to 153. So that is an Armstrong number. So at the left end you have Armstrong number coded in an imperative style which we have been doing since ages and on the right hand side you have an embedded DSL which is kind of similar like what you saw right now for Fibonacci and at the bottom is a domain specific language for the mathematicians. They do not want to know how to write code and they say as I told sum of the cube of the digits of the number is the number itself. Can this be a language? So that is a example of an external DSL where you do not declutter all the take out all the programming aspect of it and just give it a free flowing view. So I will show you this last one as a demo. So I call this I created this language and I am working on to post it on github and that language I am naming as Armstrong in favor of the number Armstrong number. So here is a small demo showing the external DSL Armstrong number in action. In this video I will demonstrate Armstrong which is a domain specific language for finding numbers that are interesting like Armstrong numbers, some product numbers and factorial numbers. Unlike embedded DSLs we have here in this example I will show the demonstration of the language that we built in the book. So if I type the expression like I am typing it here if you can see sum of the cube of the digits of the number is the number itself. So this is the definition of the Armstrong number. So what will happen is once I enter this expression it will be translated to a link cube statement and then that link cube statement will be evaluated and passed and then the results will be showing Armstrong numbers within 1 to 10,000 range. So let us say it entered. So from that I generated this query ran it input is a predefined range of 1 to 10,000 and when fired it provided this thing. So in the interest I will show you the example here. So I call it Armstrong console and so unfortunately it is it cannot be this font size cannot be increased. So if you say sum of the sum of the cube and you can do odd digits even digits all these things odd digits. These are the key words or the phrases of the language some of the odd digits of the number is 10. So I entered some of the so how it is interpreting is we are putting it into a stack and then reverse the stack and then taking one element at a time and figuring out what could have been the meaning. That itself is a syntactic sugar you can omit it. So here you say sum of the odd digits. So and this is an example of an example of query that you can do and you can do all kinds of bracketing some division and all that. You say sum of another important interesting number is called factorial numbers. So you can say sum of the factorial of the digits of the number is the number itself. So there are three such numbers in 1 to 10,000 range and this query so enough of demo. Now let us show you some code. So how this is done? So this is an example from the book and so everything is created as an so whatever every domain every language how languages are there is computer language or human languages are not different. They are like we have set of vocabularies and set of rules to glue those words together to say something to express ourselves. So domain specific languages are also not different. So we have something like here in this domain we have something like cubes where these are the domain specific language for this elements and so as you can see everything is written. So the digits if you see so digits takes a signature of digits is like this. So when I say digits of the number this method gets called. So it is an extension method on integer that takes an integer and returns the digits as an enumerable of integers and so on and so forth and then some amount of plumbing here for the parsing and then essentially you get what you just saw. So the idea is this is an example of a domain which is universally understood mathematics and you can translate it to so any domain. So you can do file systems you know FTP what not. Wave development, testing all kinds of things can be done. Hope you enjoyed this thing and as much as I did while creating this here, here is it. It is a map, it is a plain map and some kind of. So here you see so basically whatever I am typing is a plain map. So if I say times it will be star if I say star sometimes what we say 5 times 4 but what we actually mean is 4 5 into 4. So and then we say ease or proper divisor or even. So what I did essentially is a mapping between English like phrases to my extension methods and then glued those together to you know create a query. So if you see here in this in this output see even though you see this some of the I given the command the some of the odd digits of the number is 10 and my stack contents whatever I wrote. So some off is ignored since it is not a keyword then dies also ignored. These are just like stop words reduction things and then odd digits is a valid armstrong phrase. So it is it is registered because odd digits has a mapping in the extension method and then the number is also pulled because the number gives me the object on which I can call that method. So as I know if I do not have the object integer then I cannot call digits on it. That is why I am always typing number digits of the number that of the is just a source secret source but I that if I do not give the number then it will not work. So it seems may seem like free flowing which is free flowing for the context of demo but it can be full-fledged using a lecture odd digits some no you have to say odd some of the you have to say odd digits of the number that or you can minimalistic. So you can say odd digits first you say some then say odd digits then say number then say is 10 this will also work because we are doing a minimalistic we are not using any stop words but you do not say English like that. No we do not want to stick to the syntax it can be but you when you when you when you say some when you explain a rule you follow a syntax even without knowing it. So you do not say we normally say some of the cube of the digits we do not say cube of the digits cube the digits and then do the some we do not say like that. So even though we do not actually mean it it is our we are already programmed to do like that. Actually this does not allow debugging then I have to so as soon as the input this statement comes in we there is a method called generate armstrong statement and that does the conversions. Now for each of this token if there is a matching mapping then we create a builder and then we sanitize the brace because we do not want the brace to be. Yeah I think this part is fine I was interested in the link compiler that you showed where so you have a string which represents the link query. Oh how did you how did I execute any link query. Okay for that there is a link compiler called eval evaluate link compiler is there I will show you the page. So that is a open source tool. Yes. So if you go to codeplex and there are actually another one called text to delegate by some Japanese programmer. So here so if you I use this once I generated my link queue I pass it on to evaluate to do the operation. So this can be done on a free flowing user interface. So where users can come in and say in voice okay give me this customer's data and then you translate it to your query and then pump it. Let us go back to the talk. Okay another important interesting thing is metaprogramming. I do not know whether you are familiar or not. Microsoft is releasing Roslin. Roslin is so far compilers has been had remain like a black box. You write source code something happens we do not know exactly what is happening and at the other end we get some code generated and it is executing on some machines architecture. We do not know what happens inside. But the bad thing about this is by virtue of parsing your code compiler has lot of lot of information about the code but it is lost as soon as you come out of the execution and we are not able to tap into that power because writing a parser is not a trivial task and with the changing you know language structure every now and then it is very difficult to maintain a decent parser which parse. So Microsoft has released Roslin as a CTP. So you can go in this link and download it and you need Visual Studio 2012 to install it. And once you install it it will it has several APIs. So you can build your own refactoring tools and say when you right click and say rename as method rename method or extract method some same similar stuff you can do you on your own. But I will not show refactoring today I will show some examples like how many times you have seen like a method is taking some five arguments and may be using only two of them. So this type of methods are actually potential candidates for refactoring. So before you refactor you have to know which methods to refactor. So we will show some example. So here you see Roslin gives you a very clean API to work with. So there is something called a syntax tree. So you can say syntax tree has a static method called parse text and you can parse in raw C sharp code or I think you can also parse BV code. And so here I parse you know C sharp code here. In fact this can be any code actually looking at don't judge it by this but it is a C sharp code let's assume it is compile level in C as well. So here you say get root I want to find out there are several tokens and method declaration token is one token. So get root gives you the root of the syntax tree and then you say descendant and then do a filter operation on that and say give me all the methods that has been declared. See how clean this API is I mean I think this is very hard to beat and come up with something like this and then you say okay this gives me what this gives me an I enumerable of syntax node and then I cast it to a method declaration syntax node. So that gives me a method and I projected it to a list and so I now have a method declaration syntax list. Now is the fun is we are now going now this is all list now we can use all our link queries now we project it to a and then I say parameters Z as this point is a method info method declarations syntax and that has parameter list which is a parameter list syntax and then you say parameters that gives you a list of parameters and then you say p.identifier. So you can see how deep this API is. So it identifies not only the name of the parameter but the type everything it is beyond reflection okay and then say identifier.value text will give you the name of the parameter. Now you get the method name as jet.identifier which is value text and if it is using all the parameter then all of then this condition must be true right body.getText and two string contents x. So if it is using all the parameter this should be true because jet.body gives me the body of the method and then if I say two string I get it as a string okay and then I am filtering all those that are not using all the parameters and projecting their names and the statement. So as you can see here I had two methods two dummy methods one called fun that has a parameter called Z which is not used in the body. So it will print out the it printed out the name of the method. So if you have really long you know which ones to target for your refactoring and this is really super easy to build because API is very clean. Similar stuff another one is if they say that if you have lot of local variables it is not good and maybe you can store it somewhere and so this gives you a number of the local variables. So similar first like the last time we created a method declarations syntax node and then find the local declaration statement and so syntax kind I just put a dot here and show you what kind see such a big list. So everything that you can have you know imagine of or maybe even do not know of is here you do not have to tell anything you just put the code. It is not cross language it is C sharp no there are two DLs that you can use one is one two references one for C sharp one for VB. So if you if you give the language it has some inbuilt mechanism to identify which language it is and then essentially use some factory method is there depending on that it uses that other syntax tree if it is you parse it VB code it will still parse it. There is a fantastic video by Anders Heiselberg in C sharp build event. You just type and Anders Heiselberg C sharp Roslin then build and you will get the video. So there he shows what is the potential what is the path forward for this. So here it shows the number of local variables it has. So all why I am showing all this is because I want to you know say that before and after refactoring you want to see that your results are appropriate and for now you there are some tools that you can buy like N depends which is costly. So N depends gives you support for finding all this information and but you can do it on your own using Roslin. I think enough of Roslin I will just have another last. I have two topics you guys decide data mining machine learning which one you want machine learning. So machine learning so this is a flower called Iris there are three types three species Iris versicolor Iris virginica and Iris etosa. And some botanist actually created a long file with all the sepal length and petal length and what not. And the task is to identify the flower. Suppose I am not a botanist I tell you the measurements of that sepal length petal length everything you have to tell me the machine learning algorithm has to tell me whether this is a Iris which flavor you know which species sorry. So for this there is a algorithm called KNN KN nearest neighbor. How many of you are familiar with that algorithm? For the rest so if you look at this you know picture here. So in two dimension if I plot a point then it will be somewhere here and if I say if you just imagine the origin and 0 1 is here and 1 1 is here. So if I say 0.5 is in between if I say 0.5 is in between so maybe if 0 has a class 0 1 has a class and 0.5 might be having the same class. So if you project your you know records and post it into two dimension it will look something like this. And let us say these are the records of cancer patients and these rates are you know malignant cases and blue are like benign cases. And we have a new patient record which is green and we have to identify with the you know what kind be what could be the case for this person. As you can see the circle of confidence inside we have two red and only one blue. So in the neighborhood we see a popularity of malignant cases. So we might want to say this new person is unfortunately will be a malignant case. So you are what your nearest neighbor are neighbors are. So you are known by the company you keep that is the broader sense if you say. So let us see this example in coded. So what I will show you the file also. So this is the iris file you can download it just say iris csv and you will get this file. So these are the sepal lane sepal weed petlain and the name is the class iris setosa has such and things versicolor and then virginica. So what I do is I will load all this data train my network train my algorithm and then show your result. So what I do did here is I loaded this iris versicolor selected the rows keep the first one because that is the header we do not want that and then created a projection of sepal length and name and random subset is a method available from a API called molding. So that gives you a random subset. So you do not have a sampling issue. So you get a random subset of 100 and then you create the euclidean distance function. So you have 8 elements because there are 4 elements. So we have 8 elements and then we create a normal square root euclidean distance functions that we are using you can use other distance functions. And then at the end we just sort the records by you know minimum distance from the minimum distance from the record and then create a look up if I just I will just put a dump here so that you see what the look up is. Now you see here all the elements 5 elements that are closest to it is things that it is a versicolor. So these are all the closest 5 closest elements that are versicolor. So may be the test data that I gave it is test sepal length, petal length may be this cloud is also a versicolor. I think we have to end here and if you have question and please buy the book please it will buy me a coffee that is all. Any questions? Data mining and how do you say it is a machine supervised machine learning algorithm. So supervised machine learning so suppose how you teach your children a b c d. So you take a picture of a and show tell your son or daughter this is a this is a this is a you do it for say 4 or 5 months at the end that person you know your kid learns this is a. Now we know that is black but from childhood if you are taught that is blue you will tell that is blue similarly this is called supervised learning. So we we are telling these things are tagged with these things so you try to learn the matching. So if I give you 5 kinds of apples you will still recognize they are all apples. You will say this is a green apple this is a shimla apple and this is a fuji apple but they are all apples but you know how to classify them you will not be saying these are pumpkins that is how supervised learning works. That is all thank you.