 So, and Shri and I will talk about R, so as most of you might already know, R is a language which is more in tune for data analytics and statistics. So today I am going to talk about the basic part of R as well as taking a specific application of a stock market where people can create room-based trading systems. And how R can be used so that you can analyze the data of the stock and distribute these technical parameters like moving averages etc. But I am going to try and put on different aspects of R. So what is R? So this is a textbook definition in which you would find if you go to the R website. So first of all, it is very much written or developed with data in mind and we will see how that is. So it has all the facilities that we need to scrap data. You have for analyzing and doing calculations of large data sets as well as facilities and libraries for visualizing the results of the analytics that you have. So it is object oriented in the sense that everything in R is an object and these are the few, and it is built around a few foundation objects like a vector, list, factors, matrix, data frame series and time scenes. Now if you look here, this is how you would define a vector. So if you want to generate a large sequence or index, it is very straightforward to define a vector. A vector is basically a collection of same type of objects like integers or characters or anything. And then if you define, it gives you different facilities so you can define different sequences, random numbers etc. as well which you would do. But the nice thing about it is if you look at like even the operations that are there, for example, say the if statement, normally a statement will be working on say one particular value of this. In this case, what you would find is we have defined two vectors. You have the odd which is 1, 3, 5, 7, 9 and you have the even. And then the if statement essentially what it does is, the if statement what it basically does is based on these Boolean values of this vector, it's been picked either from one set or the other. So it's very, very easy if you want to compare and pick items from different sets or if you want to combine different data sets that you have. You can also have lists which are a collection of dissimilar objects. You can have integers or you can have strings. And then it also allows you to have named elements. In addition, it also allows you once you have data, you want to subset the data which is that you want to fit, based on conditions, you want to select certain parts of the data which are there. So something like that is quite simple here because here you can see that for example out of this array or vector you have selected only elements which are there which are less than 5. So it's quite powerful in terms of how it abstracts these kind of things compared to other programming languages which are more functional like C or Python or anything else. This will give you lot more quicker ways of prototyping as well as selecting data which is out there. Also the array map. So similarly you can select, include and exclude data based on how you like it. You can do operations on large sets of data. So here if I say I multiply, even if I multiply by 2, it's going to go ahead and multiply the whole of the vector with that. So that way it is quite, quite easier to program. Then gives you ways where you can run different functions on each of the elements either individually or collectively. So here I can do some of the entire vector or I can do, for example, here what I have is I have basically this with two different elements. There is an item A which has 1 to 10 and then item B which is then 20 and I can apply something on individual elements. So here I do a sum but I'm summing the individual elements. So that way is quite powerful when it comes to handling and manipulating the data and operating on the data. You can define functions, you know, but what the art really starts to come into play is that it can take all these very, very simple constructs and start dealing with large sets of data. So for example, what I have here is, you know, a dump from natural stock exchange of, you know, about 1,500 stocks and what their positions were yesterday. So you can go ahead and import that CSV directly. So this is how it looks. So whatever was in CSV you can directly read into it. You can also read from different databases or you can, you know, files or you can take plain text files or HTML files or you can get JSON objects and import that data into R as well. And then what you can do is within this you can select what you want. So this was what the original was. And then you have selected part of the data and it's quite straightforward. It's similar to writing a select thing that I want to select these rows and I want to select only stocks which are the equity symbols around. And then even calculation on large sets is quite easier because for example here what it shows is that I'm just doing a calculation between what was the last clues and the place clues which gives me how, you know, how much did the stock change in a day. And then for the first 1,500 or so I can look at the portion times and see, you know, how many stocks have moved in what range and I can set up bins as well. And it shows me what is the percent data change for each of the stocks. If I plot it, most of the stocks, at least on Friday where the trending negative and very few stocks were actually positive and if you want to look slightly differently there are different ways and if you want to do a cumulative thing what you could see is that this is that we go and analyze. The other thing which is really well is time seeds which is that you take a number of reads over a period of time and you want to analyze that data maybe with laws, maybe with financial data or maybe it is any trend you want to do. So there are different kinds of time series that are can handle it. There are different classes that would allow something like regularly spaced time series where you have periodic intervals that are there. Then you can have different classes for multiple time series and time series which are not regularly spaced. Now, if you look at one of the time series application it is how the stocks are different commodity prices move and how you can find trends within them and how you can analyze them. So far that offers you different sets of libraries that you can use. So at the base what we have the time series object which is the most basic one which we have the time series and the value then you have point, point which is a financial modeling framework which can convert the time series from day to week here, month here or any of that and then you have technical training which we can be moving out of just frequencies etc. And then if you want to go and find out trends based on that what you need is a framework which will allow you to find signals. So for example what you want to do is you are looking at large spectrum of stocks like 1,500 or 2,000 stocks that are moving and what you want to find out is that what has been the moving average of these stocks and I want to set the rules so that the stock process is moving average it will come into my radar and I would like to make a training decision. And the one stock is basically a strategy model framework which will help define these rules that basically will signal any change based on whatever parameters that will be defined. And then you have a plotter which is nothing like a register where you can make different entries and plot that will take over a period of time whatever strategy you apply how is that performed. And then you have performance analytics which means compared for a large number of training decisions or anything that you have taken how the portfolio has performed how has it compared to other assets or anything else that you have. So for time series we look at that so we're creating a time series from basically CSB file and that's the stock price of HDFC now this is how it would look normally and this is the kind of graph that you would see if you're going to CNBC or anywhere else it shows the different volumes and how the stock has turned it over to get the price. Now I can easily go ahead and add different things to it so I don't like that it's too voluptuous on a daily basis how you like it on a weekly basis and this is only for the next 12 so earlier it was on a daily basis now on a weekly basis how the stock has how you want it to become a monthly basis so you can quickly either compress the data or expand it which allows you to kind of step back and that's very, very useful when there's everything in life such as data also if you want to see how the print has been over the period of time I want to see over a rolling period of how the print has reached if I'm doing something like a cohort analysis or something where I have these events where some user is there then he's logging out or anything I want to see that okay what has been the average period of user's session say for this week versus past week or anything else so I can apply a similar kind of analysis there and I say suddenly what has been the year period of the week period of the month period of the day trend which has been there so and then obviously you can take that and plug that in so I mean so it just shows how simple it is that few lines of code you can take a large set of data it has been there for there's a huge number of measurement points and you can quickly summarize it analyze it or display it as well now the other thing is that okay you have seen the data and all this is ideally doing is fine what you really want to do is that have an automated way where you can set some rules and triggers so that based on a trend that we are going to see you will look at the data and basically trigger some actions and to financial markets and others what people do is that they will figure out you know some technical parameters so they will set these triggers and they would like to put money over those triggers and do trading and these are the theory calls that many of times will be here on the channel that okay trying to stop this extender it might go to 620 or whatever it is and then the other thing is that once you have a strategy or a hypothesis you want to validate the hypothesis the information that your hypothesis is going to go back and test your hypothesis so R allows you to do that where what it basically does is that you can define a strategy so for example here what you have is you have a movement average which is there so for example so for example I have this chart here and now what I am going to do is this left so I can see that okay the stock is trending based on you know whatever it is daily price and then the red line is basically the moving average and let's say I want to create a rule where I say that I don't want to own the stock because below the moving average the long term trend in that case is down I don't want to have any money whereas if the stock moves about the trading average about the moving average then I can see a trend that I want to put money on the hypothesis let's say I have now what I can do is I will set that rule first of all I would add an indicator so I go and decide and here I say okay I want to grant this hypothesis starting in 2007 this is my money I want to put on it I didn't you know have an indicator which is moving average and the parameter here I would have to moving average 50 day at 200 and whenever there is a crossover negative or positive I will take a trend and then I will define the signal which is that signal it says that I have two parameters moving average 50 and moving average 200 and then whenever the relationship becomes greater than that then I want to label that moving average 50 is late and then what I would do is once I have defined that crossover it says that when that happens where the crossover happens I actually want to go in and buy some stock so I am going to buy 100 stocks whenever that crossover happens so that is the rule that I define and then apply the next thing to do is basically go and apply that strategy and see how it performs so let's just run that and what you have seen here is that there are some orders which are placed and things have happened so let's just see what strategy is there so here what you can see is that this is how stock moved the red line is the 50 day 200 and moving average and the blue line is the 50 day which is going to move more rapidly than the slower average this is where I have only stocks I own at any given point in time and this is what my profit and loss which is good so here if you look there is a crossover so I have gone ahead and bought some stocks and then the blue line has crossed the red line which is the blue line being the faster movement of the line and then when the stock dropped and then the trend reversed I have gone ahead and slowly that stopped unfortunately the trend reversed very quickly and then again I have bought the stock so this is how you have seen that okay this is how the portfolio has performed so you know and then once the stock starts to oscillate here then essentially you will see a rapid sell and buy acne there and okay the strategy will give very good returns for this but then there is another strategy which is the polluter branch which are a different technical indicator but I can run it similarly here you have two more faster moving averages what you do there is that you say that the stock is going to hypothesis is that the stock is going to trend between these two banks and whenever it fits the higher bank I am going to sell the stock and whenever it is going to trend at the lower bank I am going to buy the stock because the thing is since it is going to move what you can see is that if I buy something here once it crosses that lower bank and then once it fits the upper bank then it is going to be a gap enough that will allow you to make some profit and this is the strategy and you can see that this strategy seems to be paying out more in terms of profits and then what allows you to do is that you can build a strategy running across a large number of stocks so you could run it across the 1500 or 2000 stocks which are there and then look at how it performs no so algorithms are already there this is something which is very popular in financial markets so that a lot of people have planned that but see the thing is the standard algorithms what are called the standard indicators but what most people do is that you would say now you would run something and you would say that okay I have done this and then with a moving average of 50 this is the return I get now people would run additional returns from top of this where they would say that they would optimize the return they would say instead of 50 does 60 give me better return or does 70 give me return or they would try to combine 2 or 3 indicators together so what they would do is they would say okay this band is there this has moved now can I add a moving average to it which means that I make a condition which says that if the moving average is this and the band is this then I want to buy and then I want to sell those type of things people have to write themselves this is also popular this is a generic this thing this is just an example which is taken if you look at genomics and things also are very popular R is also very popular if you want to do it you can see that it's very quick and very fast in terms of prototyping and things so if you want to prototype something you are really quickly even for log analysis or anything you can very quickly do it and this is from produce has an interface the one thing with R is that it's there is a currently this is an interpretive form of R so in terms of performance you are not going to get very fast but there is also a bytecode compiler which is other levels which will give you the performance you are really going to get in time it varies high speeds then you will need something that's closer to C so what people do is that they have an extension called RCTP where once they have done the prototyping once they have finished it they will implement four portions of it and see for performance and that is what a lot of people do that they will run the back test and things from high frequency data and then put the performance part of it in C++ once it's in C++ they can talk to any database or anything but R also has interfaces with different drivers for Mongo and other databases with what? I don't know I mean there are two ways you could I don't know how the thing interfaces but there is nothing that stops you can run R there is something called as a hard server where you can push messages to it it lets you continue in terms of analysis can you call that from Java? yeah, there is an hard Java interface so do you write R code? you can you can call that there is also you can have a socket based interface as well on R server so you can interface Java you can so R also has something called as an interface to Google visualizations as well another visualization so you can do an R also works with yes and things as well no, I mean I mean people use it for ID transparency training so there is a data site that is substantially high at one point you are not going to have 10 hours of data and have single instance transparency so you will have something which is going to break it down into smaller chunks the question is is all the processing in R done in memory or does it have a space optimization as well so that it can only load parts of the data are the algorithms used to open for keeping any part of the data in memory or are they like all assuming that is that your question no, I think there are high rates which are algorithms which can be kind of spooling and then the data is going to be kind of spooling and then there is a kind of a bit of a kind of a kind of a kind of a kind of a kind of a kind of a kind of a kind of a