 Welcome again, we've got another panel discussion this evening and today it's an exciting array of startups that we have here and the intense yesterday we had a panel discussion to look at how enterprises are applying analytics and what are some of the problems that they're facing. Today is going to be a different one. The theme today is to look at what are the horizons on real-time analytics. These guys being startups, obviously they are trying to push the envelope of what is possible and we thought that it would be very exciting for you guys to get some perspectives directly from them in terms of understanding where things are heading and also to demystify certain things that people have been talking about over the last couple of days. I have the good opportunity of introducing a galaxy of people on stage. To my right is Satya Kaliki, he's co-founder and VP of Engineering and Architecture at Indix. Indix is building a product intelligence platform for brands, retailers and developers to enable product aware apps and pervasive commerce. Previously Satya played various roles like CDO at Harvard Medical School and was co-founder of a few startups in the space of analytics, healthcare and networking. That's Satya for you. And then we have Sanjeev on my extreme left. Sanjeev Jha is the head of analytics and data at Komali. He has over 18 years of experience working with large enterprises as well as multiple successful startups in India and the US. His experience spans development of big data and analytics solutions, development of machine learning algorithms, business solutions, highly scalable and available infrastructure for serving web and mobile content and digital advertisement, various high-volume and high-performance chemical or bioinformatics tools, storage and backup software. Currently he heads data and analytics at Komali Media. Sanjeev is Toghaf certified enterprise architect and MBA from Santa Clara University. And then we have on my extreme right, Apur Gupta. Apur Gupta is co-founder of Venera, a startup bringing search and analytics to modern data centers. Previously he's built many systems at places including Bell Labs and most recently at Google. At Google he had done many hats including designing critical systems like AdWords serving and AdWords reporting. He's passionate about making software go fast and applying technology to real-world problems. That's Apur for you. And to my left we have the lady on the panel, Monica Pal is CMO of Aerospike. She started a career in R&D at Apple working on Mac, TCP and Apple's unified messaging products. She then caught the startup bug and since in a startup you do whatever needs to be done. She went into marketing. Since we're running marketing and launched a number of startups in the internet security in middleware segments and is now at Aerospike. She has an MS in computer science from University of Wisconsin at Madison and a BA in computer science from Rice University. And of course we have the other panelists, Dipinder Dhingra, can you raise your hand please? From Mu Sigma. He leads product strategy for Mu Sigma. In his role he's responsible for conceptualizing and guiding the vision for development of platforms and solutions that help Mu Sigma scale the use of analytics and decision sciences. Prior to Mu Sigma, Dipinder worked in various operating roles in consulting product management and pre-sales for enterprise software firms such as TIPCO and I2 Technologies. He has a BTEC from IIT Kanpur and an MS from University of Massachusetts. So as you can see we've got a very exciting set of panelists today for the topic. I'd just like to probably open by giving you a sense of what we're going to have in terms of the format. We're going to have two segments today and the audience will be given the opportunity to answer our questions at the end of each segment. That's how we have kind of structured the format. So primarily we're going to have two segments. The first segment is actually going to be called demystifying real-time analytics. There's been a lot of talk about it and I think the panel's thought that it was important to settle the dust in the air. So that's the first topic. So I would request each of the panelists to give their take on the topic. Perhaps we'll start with some. So thanks for listening to us on a Saturday evening of course. There are a bunch of challenges that everyone of us face in terms of real-time analytics and I'll bring from my context what it is. So we are out of here. Check, check. So hope this time it's audible. So presenting from an index context and what is that we do have as a, how do we solve this problem from our context? So people come to us, customers or users come to us and check, check. So they come to us to expect prices of products that are sold across multiple stores and they expect them to be as accurate as possible. A simple problem. So the multiple things that we have to do in order to make that happen, this is the context of, hey, how is the product matched with other stores? Exactly the same product, the same bottle of water being sold by 100 different stores and the price of the same bottle of water across those 100 different stores. I know it's not about one pack, two pack, four pack, ten pack. All of those are different things. So they have different prices. So we've got to get that right. Now how do we get this right in real-time in terms of making sure up to the second price accuracy is a problem, right? So in my context, I'm going to use that as an example, you know, maybe using different things. So on others we'll talk about it from their point of perspective. Okay. So the way I think about real-time analytics is essentially conditioned by what we do for our clients in Mu Sigma. I think some form of real-time analytics has been around for a long while. The concept of real-time analytics has evolved, right? Let's say about 10 years back, you know, real-time was getting the latest information, right? Not having to wait for, you know, 24 hours a week a day to actually get the latest information so you can make a decision. Then, you know, the next thing was, can I get the latest information when I want it, right? Instead of, you know, I get the latest information, but can I do it on demand, right? Now what we've seen real-time analytics evolving to is the ability to make decisions in real-time. We think a lot about decisions, the purpose of analytics and big data and whatever we call is to help you make better decisions. And so what does that mean? There are situations where two things happen where you need real-time analytics or whether you call a near real-time analytics, whether it's millisecond analytics, so on and so forth. First, you have to act on the latest information. If the customer is on a website, I cannot wait to act. I need to act on the information that he's giving me today based on his behavior on the website. That's number one. And you cannot wait to act, right? So I have to act on the latest information that the customer is giving me. I'm taking the example of a customer, but it applies to other situations as well. So I cannot, as well as I cannot wait to act on it, right? Because once the customer is out of my website, whatever action I do is no longer relevant, right? So there are two driving forces. Can I get the latest information about the customer based on what he's doing right now, as well as then can I analyze that in real-time and then act on it? So those are the two driving forces we think for near real-time analytics or real-time analytics or whatever you'd like to call it. Hello, check, check. So when you think about real-time analytics, I'm sorry, I'm very simple. It's real-time and it's analytics, but real-time means many things for different people. Real-time, as you mentioned, depender could be, you know, I'm not waiting weeks or days. I'm getting it today. Real-time may be sub milliseconds as in real-time bidding where within 100 milliseconds you have to make a decision. Analytics is really, as you said, decision-making. I actually believe that every application moving forward is going to be about real-time analytics because every application is making decisions in real-time, whether it is a decision on price, whether it is a decision on personalizing a web page or putting the right offer in front of the right person. So I really believe real-time analytics is every application moving forward. Having said that, perhaps today if you look at the way things are segmented, you had batch analytics, which was the past. As people are wanting to make decisions on streaming data or real-time incoming data, et cetera, that's more of what people are talking about when we talk about real-time analytics. I actually see it divided into what I call HDFS analytics, so stuff using the Hadoop infrastructure and what I call hot analytics, which is on operational data. So, you know, that's a different take at it. Thanks, Monica. And I'll take up from you, Monica, a few things. And I classify real-time in a context of technology in three parts. One is the real-time decision-making where you have in a real-time bidding system and you have to calculate what is the pricing for your inventory you can be, and we'll do a lot of real-time and the batch processing together and marry them and come up with real-time pricing. The second is how you classify and segment. So, we keep getting a lot of real-time streaming information about user behavior. You classify runtime and it has to marry from, and you have to retrain your model again and again and as fast as possible time. And the third part goes more like operational part of it. You know, in attic technology, for example, you have these campaigns and things are changing on a platform in a real-time and there are multiple levers for that. And you will have one or two percent managing 500 campaigns unless you automate it and put this operational analytics, a class of what I'll call a real-time analytics, which is operational analytics, where you have a streaming data, you have batch information, historical informations, correlate it and see if something is changing, if it is changing something and then you generate some events which kind of change your order. Like I said, if I'm putting a pricing multiplier and change them at runtime and that's kind of real-time for me in the context of attic technology. So, yeah. So, Vinera is in the space of basically applying analytics and search to data centers. Now, that is essentially operational data, right? But it is operational data about machines and there things can go south very, very fast. So, how soon can you capture data? How soon can you analyze that something is wrong? And how soon can you act upon it is critical to the success of the business, right? So, that's where we come in. But taking a more general view, there are some things which have always been real-time, right? You swipe a credit card, the transaction has to finish in a certain time. Real-time bidding, it has to finish in a certain amount of time. Analytics or at least what this conference is about is big data analytics, right? So, there's lots of data. You need to re-compute some models. Can you... How soon do you want to ingest the new data and reflect it in your models, right? And this is what we call as near real-time analytics. Or the longer you move the window to reacting to the entire world, the better it is, right? And you hit certain limits. And ultimately, you hit the physics limit. That light travels only so fast. So, let's say a click is happening in Europe, right? And you're doing your analysis, let's say in New York. It will take certain time for things to come back. There is a physical limit to it. So, yeah, I think we need to distinguish between big data real-time analytics versus you have an offline model and you just need to evaluate it when something is coming in. One of the things in a lot of people's mind is obviously real-time is exciting. It also comes with a lot of investment. So, what are the takes of the family terms of return on investment on real-time analytics and what kind of experience can you folks share on that? So, can we perhaps start with... So, you know, again, bring the analytics context. The time is the value, you know? So, like if a user or it's an audience is doing something, is checking for, you know, clothes and somewhere e-commerce sites and how soon we target them is the money. And in our expenses, if you go and target a customized and personalized masses to an audience within a minute, it has increased, given us at least 30% post on the conversions we can derive. And so... And, you know, kind of what kind of return we have seen, it's just not the... I'll not say that it's just the money part of how much dollar you make, but kind of impact which you make on your campaigns or advertisement for your customers that itself is huge. Yeah. So, I think the ROI you could think about from a micro and a macro perspective. The micro perspective is very similar to what you were saying. Essentially, you know, I'll give it to you in the context of an example. You know, we were doing this, you know, initiative for e-commerce retailers and they had to make a decision of whether I should give a customer a chat window when they're on my website based on certain characteristics and behavior that the customer exhibited on the chat on the website during the session. Now, the ROI aspects of that are embedded into the analytics that you try to drive with that information that you have because I might just give everyone a chat window, but that may not lead to ROI because, you know, behind the chat window is a customer service representative who's going to talk to you what is that the time of the customer service representative is money, right? So, what is the ROI of your probability of conversion that you'll buy something if I can help guide you through a chat window and a customer service representative answering your questions vis-a-vis versus, you know, what is the cost of actually giving you that chat window. So, there are certain micro aspects on the ROI which are embedded into the analytics that you do on top of the infrastructure that you might want to deploy for something like real-time analytics. At the macro level, I think like Monica was saying, you know, is everything going to become real-time, right? The business case at the macro level is, what is the speed of your business? What is the rate of change of your business? Now, it's a no-brainer for an e-commerce company or for an app tech company or and so on and so forth. But can I imagine a manufacturing company thinking about real-time? Can I imagine, you know, a supply chain company thinking of real-time? And so, you have to think about at a macro level, am I going to invest in this kind of technology, invest in the kind of analytics to drive the decisions that I'm trying to do. What is the experience I need to drive off my business when I interact with my customers? So, that's the macro perspective that comes into that picture. Yeah, I have a slightly different take on it, but law of diminishing returns always apply, right? There is something where if you make it fast, now you're doing it in like 100 milliseconds and you have gain for your business, right? But then, if you can do it in like 50 milliseconds, going from 100 to 50 does not give you that return. It might give you some return, but the law of diminishing return applies. And conversely, there's a cost to it, right? The cost of, after some time, the cost just becomes too high because you have to build new infrastructure to do this. So, it really depends on business or business, as they can decide. So, I'll give a couple of real-world use cases or real-world examples of where people have benefited after using our service. So, this is a large retailer focusing on, it's an IR 50 retailer about focusing on grocery, health and personal care, that category. So, they were, you know, they were clearly, before they started using our service, what were their profit margins on a category like diapers? You know, largest sales of diapers happens in the U.S. and online. And, you know, after they introduced the ability to dynamically price their products based on competitive landscape analysis and being able to respond to it in very short span of time, they were doing A-B testing to say, okay, what if we do this? What if we don't do this? For certain customers, they were using it. For certain customers, they were not real-time responsive dynamic pricing. And based on that, they actually saw a 3% increase in their profit because 3% sounds very small, but actually in this space of internet commerce, 3% is a big jump in, for their size, it's a multi-billion dollar company. So, you know, it's a very clear ROI when it comes down to, it has to be applied to the use case. I cannot just say, I assume, just because I'm doing real-time, I'm expecting my return on investment to go up, but I have to really, really take in the context of where does it apply and what is the cost that I'm going to take on for this. And if there is a service that makes it available at a price point that is predictable, and then it's much easier for them. You know, for us, we provide them a very specific price point and that's fixed for the month. They don't change it just because they use it. They don't use it. So, that's a good example of how people use something real-time and convert into dollars that they got. Hello? Yeah. The ROI on real-time analytics. You know, I think when we talk about big data, everybody's been talking about variety. They've been talking about volume. This is about velocity. Velocity is directly related to value because that's when money changes hands. When you talk about real-time analytics, basically it's about now. In real-time bidding, if you snooze, you lose. If you don't show up for the auction, you don't bid. You just don't make money. But more important than that, when you show up, you have to make sure that you're putting the most best offer, the winning bid in there. And even when you talk about pricing or serving people on a website, not only do you have to deliver that webpage really fast, but it has to be relevant. So, the ROI of real-time analytics is that you show up and you show up with the right stuff and you care, and if people care, then you have a chance of making money and, you know, winning a customer. Thank you. Those were interesting perspectives, 30% conversions, you know, the micro, micro view, and then, of course, the law of diminishing returns, and, of course, it finally depends on the context. That leaves in a lot of room for people to interpret it the way they like. All right, I think we'll probably throw it open now for some questions on the discussion thus far, and the gentleman there in the white shirt, can somebody give him a mic? We have a spare mic here, in case. Yeah, so we had a discussion saying that, okay, we need to move fast. That was one point. The second point is when you're acting fast, you have to act relevantly. That's what she meant, and there was something related to A-B testing. So, there will be situations where you can actually do the A-B testing, like the example where you... Speak into the mic for the benefit of others. So, there will be situations where you can do the A-B testing for the analytics, like, whether you ought to serve the chat window to the customer. So, how do you take the decision there? Whether you need to take the analytics, or just go with the, say, one-bust-in A-B testing for the customers, and based on those results, take a decision. Satya. So, in the context of what I mentioned as A-B, is that this customer was actually trying to evaluate the ROI. Now, they clearly saw their ability to close, their ability to convert that customer to buy stuff was higher in the case when they were using real-time dynamic pricing as an option. When they were not using dynamic pricing, when they were not really actually having a check before they presented the price, did they actually check against their competitors in that few milliseconds, and if they did not present for a certain set of users, they did not use real-time, and the number of conversions were low. That was the deciding factor for them. So, it is the question of using A-B as just a way to decide. To say, is this useful? Has it been helping me really get my ROI? Or is it not helping me? Then, that's just a way to decide for them, justification for them. Can we... Sorry. Can we interpret the question in another way? Not using real-time would mean not personalization and would mean a dissatisfaction and it could decrease the loyalty and the number of... At the end, your revenues will be coming down. So, is it really the... Currently, the real-time isn't at that mature stage or is it really improving the returns today? I may need to reconstruct your question for my benefit. So, can we play along as in... So, if you say whether real-time did it help or not, we need to say what is the context in which it's going to help in or not, right? So, the purpose here is are you using this information? If it's to present the... Which product is a different question? Let's say, should I present this variant versus this variant to this user? That's a different story. There's personal choices here and all involved. Pricing is a very simple decision where you can comparatively use it very clearly and it makes a... Make a break. So, if you elaborate your question from where... How do you interpret it differently? I mean, what I would mean is a few of the things are good to go. There are a few things which are needed to be go in the market. Still, really, a real-time market is a need to go or as a good to go. Good. Okay. So, that helps. That definitely helps. At least from my context, right? So, there were a bunch of others who actually say their domain. I'll tell the example here. They are in the lighting solutions domain. There's a retailer who sells lighting online. Now, they know for sure the lighting prices don't change so frequently. So, they clearly decided that your real-time is not going to help me because I'm good. That 24 hours old data is good for me. We take prices as of 12 midnight PST and then next day morning at 9 a.m. they have all the data. They have all the data that is completely crunched and tell them that what product should be acted on, what product should not be acted on. Now, that's good for them. The domain doesn't change so frequently. Whereas diapers, mobile phones and electronic items and stuff like that. There is competitive landscape which brings a certain dimension of dynamic pricing involved. Amazon, I don't know how many people from Amazon here but let's say they change prices on certain products five times a day, ten times a day. If that happens if anybody wants to compete, they have to really have the ability to do it. From the category manager who's responsible for the actual profit margins that they're expected to deliver I think it's very, very contextual to them. I'll add some of more my perspective to that question that like say if we talk about ad tick tick losses say our experience says that on a shopping cart somebody was on a shopping cart and he was on an item and you target them within a minute or like say initially we're targeting them after three hours and we reduce it to a one minute and the CTR which you click through rate has improved from 0.2% to 0.6% and then we added to it that within one minute we can know that what he is doing and then we personalize the advertisement for them and the CTR goes to 0.7 even 0.8% and that's a huge difference for us in terms of our ROI. What I also heard was you know should I really bother to do this right? So it's your decision whether you want to be the first mover and take advantage of that opportunity leverage the technology that you have available to you or you wait for your competition and then you play catch up. Thank you. Any other questions? Okay. I'll come to you next. I had a question with respect to so most of the people here talked about shaving my milliseconds you know from a few milliseconds you already had how do you shave off the milliseconds to actually make the response time faster and most of you tossed upon e-commerce as well. But there are some factors in e-commerce where the real time is probably one of the criteria but it is probably not the most significant criteria to give you an example suppose I were to buy a phone which cost 15,000 rupees just because somebody gives me a lower price I may not purchase from him because maybe I toss a flip card to deliver the phone to me you know they would give an unopened box and have you seen situations where you know shaving off those seconds really did not help and then you had to work on things which were really not real time. You know things which had to be worked in the back and the things which were offline which I'd buy a click you know go and fix them. Yeah I think your question is very valid right so real time is probably not going to change the experience you know you have to you have to decide what the differentiation of your company is right there are certain things you'll do in real time for operations preventing errors in your website because there's an error in your site if you can troubleshoot it as fast as possible you can you know prevent the opportunity cost of that right but there's certain things you have to decide what is your company or what is the company known for. A company like Apple is known for design a company like Flipkart is probably known for something else a company like Amazon is probably known for something else right so if you figure out the vectors of the experience or the dimension that you want to differentiate on then that helps you focus on what should be real time and what doesn't need to be real time right so I think your question is just valid I don't know if you asked your question but that's how you need to think about it yeah. My question about the use cases in addition to the ad take world or the operations analytics use case are there any other use cases where the monetization of these real time analytics is happening. Yeah I can I can start so we've done a lot of work on this whole area of internet of things is probably something that you guys are familiar with which is not only you know collecting information from human behavior but from what machines are saying right and machine to machine interactions right so can I analyze data from videos on a retail floor to understand store traffic patterns in real time and figure out how I can lay out my stores better for the analysis might be in real time but the action actually might take more time because in a retail store it's a physical operation you cannot just move things automatically right can I help a maintenance engineer who's walking into a maintenance room of our power circuit breaker and it's based on you know an application that gets real time information about the voltage and current information of that device can I figure out whether you know he or she should really approach that because you know you might cause an accident right so there are a lot of other applications which are not related to directly to e-commerce but they're more related to how you you know different kinds of data whether that's sensor data whether that's you know machine data so on and so forth yeah my my perspective is that I think you know ad tech companies are clearly pioneers in the space the enterprise is still trying to gather all the data right first step is bring out everything build your Hadoop clusters pour all the data in run your initial analytics so I think most enterprises are still at that stage and then after they've got some of these insights then they're looking to see okay how can I programmatically in real time act on it so I think we're still in early days on that let me give an example right Uber is a classic example like all this logistics the more real time information about location you have as a classic example of the benefit by doing better right so one example that I come across is security network security in terms of inclusion and detection so somebody is actually hacking into your network or breaking into your network and how do you actually you know it could be a genuine user too but there are things that you would like to do now you have certain signals that are coming from the network and you have historic data to detect is this a potential harmful user with a harmful intent and then at what point do you actually stop them and you have to really really recover from the situation now this is something that you have to do and then probably you only have a window of few seconds to act on it and otherwise you compromise lots of things I mean you end up getting lawsuits because of you allowed somebody to compromise data the network one more example right operational intelligence again you pushed a new version of a software you're doing a rolling upgrade things are going south if you can detect it as soon as the first one hits you can probably prevent your thing from going down hi hello hello yeah I just had a question so based on business decisions you know we get certain part of our business model to be real time and then we do analytics on that and is some percentage of a data is still on batch mode and still on static so does the real time engine you know constantly talk to the static data once in a while so that you learn from whatever you know other information you're gathering as well or does it function as a standalone the reason why I'm asking this is to you know to get an eye opener on what's actually happening with brands which implement real time analytics sure so that's an interesting question and answer depends on the context you know who talks to who but you know when a real time analytics and the batch batch historical data which you have and when you're making a decision a process which making a decision has to correlate between both so I'll give you an example that I have a video download happening and I'm monitoring a real time streaming data and what has happened is that tomorrow at 3pm I have like 1000 downloads and today is zero download now then it has to be at something not going right so you have to correlate between both and similar example I can give you know at tech technology for example every day every hours what kind of bidding happens what kind of you know you know request is coming and what kind of probability of conversion are changes but if there's any problem in your system or something happens it has to correlate and go back to historical data so it doesn't it cannot work in isolation both has to work in together but there has to be a proxy and coordinator which takes the decision so I'll give another example for the same to just reinforce the message yes it is important that the real time has to consider the data that is computed by the batch so let's say you are a user on a website just submitted a review and a rating for a product and you said this rating is 4 and the number of people who actually reviewed was originally 45 and it became 46 for you as a user if it doesn't increase the counter to 46 now for you it's a feedback loop not closed so the system has to actually take the data that the batch system had processed it's 45 people and the real time system said it's one person then you need to combine that and show this guy 46 maybe other people are still seeing 45 that's fine but this person cannot see 45 they have to see 46 so this is you know very important for it may not have the real time aspects of millisecond thing but it is the feedback loop but the user perception is important I just like to add one point so there are two kinds of interactions between real time and the batch aspect right one is the analytics relationship right the analytics relationship says that in real time I want to sense process react and then in batch I will retrospect when I retrospect I actually improve my real time analysis because there might be some information that can actually help me improve my model algorithm so on and so forth that's the analytical perspective now the other perspective is actually the business perspective right so in real time a customer came to a website you know I gave him an offer maybe their offer was not the right offer or there was some error the inventory replenishment did not happen the way that it should have and he got the wrong product some let's imagine a scenario now what you need to do is that happened in real time and whatever drove that there was some analytics that drove that and so on and so forth now that real time action led to a problem in the non real time space so you came from the virtual space to the physical space now the physical space is I have an unsatisfied customer instead of having a satisfied customer which you mostly would think about in a real time scenario and that's you know we mostly talk about that but now I have an unsatisfied satisfied customer now you know the real time information that you had of the customer needs to be thought about in the context of what you know about the customer and the behavioral aspects that the customer exhibited when you did the real time decision right and so that aspect is also a different aspect because at that it's not just about analytics it's about making sure that the consequences of your real time decision using real time analytics are now being felt in a world that is offline and how do you manage that offline relationship with the customer is more for business aspect that you have to think about when you're actually thinking about making certain decisions. Hi. I have a question on the barrier to what constitutes a barrier to real time analytics so there was an observation that there are of course fundamental limits that you cannot reach but then there are other perhaps fundamental I don't know that could also be a question the closest that you can get to real time processing is if you're not processing a data rate but you're processing at information rate right I mean that's the closest you can get you cannot sub sample below information rate and the time that requires to process that is real time that could be the definition to reach there so there is one algorithmic aspect and then there is the engineering aspect currently what constitutes a bigger barrier I mean which one is more unsolved so I think the next part of the questions is exactly about infrastructure and algorithms preempted segment so let me drive us into that anyway I think the conversation started leading into the current state of real time analytics as well as in terms of the barriers and where is it heading from here so that's where I think they wanted to also touch upon some of the developments on the infrastructure the algorithms and how it will be affecting everything else probably a request to kick it off okay so as I said right the question is how real time the law of dimension written also applies right do you really need to process it at that rate and can you process it at that rate so let's say you decide that hey I need to do this in 10 milliseconds now here's your classic algorithm for solving the problem look at all the data build this model and apply it to this new data point and take a decision right now clearly that is not feasible but that problem has been solved long ago you build the model offline and you take the decision the next problem comes can I update my models frequently enough right then the question is how soon is enough and it depends on like the volume also right if you have low volume of arrival you would want to incorporate like the previous points because the relative amount of data recent data is has higher fidelity to the predicting the new data right and so there are some engineering challenges and there are algorithmic challenges right for example let's say you train in SVM you cannot update in SVM with a new data point right then and there the best you can do is you can use an online SVM algorithm but its guarantee is about that it's going to be no more than 2x worse than a classic SVM algorithm using maximum like good estimation right so there that's an algorithm problem there are engineering problems too as in like okay this point alone I can't process it or at least to process it I need to bring in this much data can I bring in this much data in 10 milliseconds in order to process it and so that places limitations on the model sizes right or let's say it's a complex Gaussian model you can't evaluate it so you probably just go with a linear kernel to do that right and this is where the things are very correlated advances in infrastructure enable advances in algorithms right and I think Monica will touch upon that but the fact that you have SSDs means you have more storage which can respond in that time which means you can build larger models larger models means more accurate models generally speaking right and sometimes algorithms sort of demand other things from infrastructure and infrastructure evolves to do that right they say can you get this batch as soon as you get it can you push it to me so that I can process it I have figured out a way to operate it right but in general I see algorithms as a slightly bigger barrier to just answer your previous question so I think you were asking where's the barrier whether it's at the engineering or technology infrastructure level or is it at the algorithm level is that currently yeah so I obviously aero spike sits at the infrastructure level our whole thesis the reason why the company was founded was because basically the founders said hey there's new processors out there new processor technology multi-core multi CPU servers there's new storage technology out there flash right SSDs PCI cards and so they literally rebuilt the database in C from scratch you know every line and they tuned it line by line by line to make it go as fast as it does today so they weren't trying to build you know 10x better than oracle they were focused on the hardware and so really leveraging more slow right so parallelizing across cores SSDs etc so from our perspective we actually believe that we have a rocket that people haven't discovered that some of these things are actually possible that you can actually crunch way more data way faster so I'm really excited because with aero spike going open source just a few weeks ago I can't wait to see what you guys will start building with it now that you know that it's possible so from a slightly different take it depends on there's a maturity about the team that's taking on this challenge of building something like this so technology is not readily available on a platform like a single unified stack you've heard last two days maybe even the workshops that a lot of people are actually it's a cutting edge research and a lot of people are still figuring out cherry picking what technology choices that are there like should I choose STOM, Kafka, should I choose Spark, Spark streaming should I go cloud data flow or should I do something completely on my own should I just go summing bird or like we did our own we modeled based on lambda architecture and we built our own stuff so there are different people exactly in the same space as like we're doing everybody solving the problem so that requires to make it land in a fit to the end users there's significant maturity and engineering talent required to make it happen that is going to be a big barrier by itself just the talent and know how to successfully deliver something like this to build something and to make it really see all the ROI we talked about to make it happen that's a first barrier now certainly it's infrastructure we heard is not ready or there are only people investing and we need fundamentally rewrite the databases to meet this kind of a need availability that definitely is another barrier I think all of these are barrier algorithms are a barrier and I don't think batch algorithms really work here you got to go for approximation and the moment you ask approximation everybody comes with only one name called hyper log log and after that they know they lost they don't even know what the next algorithm is so and that's not going to solve all problems so we got to either there are algorithms available that means we need to try if they're not available we're going to invent and people are going to come up with that so this is just happening I think in probably next year's fifth telephone we probably will see more people coming up talk specifically on algorithms that are actually solving the problem in the more context there all of these are barriers in my view and we had to solve them over the last two years because if it was available we would just take it it wasn't there so we had to invent it so my view is all of them are real challenges right so I might answer the question on the barriers but I'll just give you some experience of what we've been doing you know so if you think about you know which guys are you know who's spearheading real time they're about three broad categories actually actually two broad categories of real time and then on the big data side there's one broad category so on the real time side we've always known the financial you know the high frequency trading guys have been doing it and those guys have traditionally relied a lot on a lot on the enterprise service bus complex event processing kind of technologies then there's a very interesting area the whole robotics and artificial intelligence area which has been relying a lot on this whole concept of agent based you know you know infrastructure agent based systems which is essentially around you know you can find a lot of open source stuff also the jade the java agent development environment so on and so forth and you know that's really focused on the concept of mobility and edge computing which which we are seeing a lot of applications in because we're doing a lot of stuff in the internet of things area which you know which required which require edge computing right how can I transfer the computational capacity and the ability to orchestrate the competition which has the infrastructure as well as the algorithms as close to the event generation or to the data generation so if you see a lot of that we are doing a lot of work in that area then there's the whole the e-commerce companies have you know spearheaded the whole Hadoop stack and HDFS you know now you know with things like spark like Satya mentioned coming in etc storm coming in so that's kind of hold another area of you know infrastructure that's helping build this real-time intelligence systems we believe that you should think about three areas of focus if you think about the macro level you know the earlier thing that I said was more around the you know getting inspiration from different places but three areas of focus that we see is think about high performance computation which is not only scaling analytics to bigger and bigger data sets but also scaling analytics to billions of computations and the whole area of GPGPU computing that's one aspect the other aspect I think Satya talked about is you know algorithms in batch do not work in for data in motion so approximations we are doing a lot of work in meta heuristics you know things like you know if I wanted to do optimization a traditional mathematical programming technique would take minutes and hours but if I can use new heuristic algorithms you know inspired by biological systems you know those actually run faster because they're more search based and more heuristic based and then there's a whole area of infrastructure you should think about usability and visualization because you know if you're going to help see one of the key things is that we've been talking about automated decision for real time but what if I needed a human to make a decision it seems counterintuitive but if I can give a human the most relevant information you know and that decision can be made by human being then it is imperative on me to think about usability and visualization right and new metaphors for usability and visualization that can help a human being make sense of that information whether that information is powered by intelligence from algorithms or whether that information is powered by concepts of visualization or you know advanced algorithms that help you make your visualizations with intelligent things like graph analytics and topological data analysis how can I speed the insight that a human can get from real time information and that is a huge area as well that is a barrier I don't know if I you know if that's a barrier but we find that the ability for humans to consume insights from real time information and so you have to think about visualization and usability aspects so three broad areas you should think about that right sure so I'll start from Satya's point on exact you know approximation and when you bring the real time and when you bring when you do you are putting some statistical system in place when you put a statistical system in place you will have outliers and how you take care of those outliers is pretty big barrier and if you give example from attic world select in our beating system we have a 15 minutes window when we rephrase our data with the actual data and this 15 minutes the data is all approximation and the variation of these outliers are so high that is every day every every hours and that has been a big injuring problem for us to tackle us our systems like in certain time frame things that we are out of budgets when actually we are not and we are not serving advertisements and we think we lose a lot of opportunity because of those so that's in my opinion is a big barriers you must use the mic please that would just mean that perhaps the approximation is not there could be better approximation algorithm I mean like you said there is just one perhaps that could be the case that would be the case but our observation is that how brilliantly we can remove those outliers that will solve our problem and we have not been very successful so far I mean that's just saying that's not the case if you look at like sigmat papers from like 2000 till 2010 they are always like at least 3 or 4 papers on streaming algorithms but the problem is they are not accurate enough okay given that it's saturday evening and we are heading close to 7 we'll probably take a few more questions from the audience and then wrap up any other questions or something gentleman here so the question I had was really around let's say the relevance for real time analytics in India when you have very tight budgets when you have a lot of focus on productivity and I would say getting your technology infrastructure right how do you sell real time analytics to a client definitely not necessarily technological I think touched upon briefly in earlier point says if there is a service available which is at a fixed price point and you have a choice whether and you're clearly saying if I have enough contextual information about the user or something which I can use to say here if I use the real time capability and I probably will pay per usage of the API or for the call that I make if I'm going to pay a fixed amount of fee it will monetize well because I have a complete choice on my hand I don't have to pay an upfront fee of any kind but are there services that can distill or the technology players that can distill all of this complexity provide a solution that's simple enough and at a very simple price point that easily measurable you get a decision from a batch based data you cost you X and if you cost you say 2X now the ROI is clearly in my choice I can make for which kind of a context when I use the real time is it needed or not needed so I know what price I'll pay incrementally I'll pay double but I know for sure the information is in my hands as a user so how many and I think that technology maturity has to happen all people who are solving this problem from a end user businesses to use perspective they're thinking that line it's not solvable from a I have to pay upfront fee of $100,000 to actually even start getting ready for it it doesn't work just before the point was really not around establishing a clear case because what we said is this is new the point is when you're having new technology there's not enough case study references ROI itself was such a difficult question to answer so you don't have those specific case studies for customers you can definitely say this another question about really independent variables was it these factors which caused the increase in spend the 3% increase etc or was it other factors so one is I think those aspects where which is a challenge the other is really how ready is the market I think I'll answer that that's a very general question in the sense that it's typical of anything that comes new to the marketplace there's always people buyers have different mindsets there are some who wait for somebody to try it out there are some of mindsets of wanting to be ahead and experiment with things so I think it's to answer your question is it India specific no I mean I work with many customers globally I think it's a common universal it's like different customers and different geographies have different approaches to new technology and how they are offered it's often an enterprise strategy sometimes defined sometimes not very well defined sometimes generally understood within the company culture that hey let's wait for something to come but in some cases they'll say hey guess what let's be ahead of the curve let's try and do a POC and try it out let's get a partner to work on it so in that sense I wouldn't think it's an India specific problem second thing is every new product or technology which goes through this till it becomes a Hadoop I mean she did mention about their own product so it's like every product goes through that even Hadoop probably went through that so not everybody knew it till it became so likewise I think in that sense everything will take its place and in order to kind of you talked about this how do you sell something new right it'll only have to be through benefits there'll always be something iffy about what was the root cost that actually led to that benefit that's a fair point that he bought you know that's the price that you pay as an early adopter of technology to experiment with sometimes you win sometimes you don't sometimes you can clearly attribute sometimes you can't sorry I think the question is still not to attribute the question sorry went off track I think the question is really let's say Komli or Mu Sigma right who do they sell to do they sell to Indian customers or do they sell to let's say US banks so the classic example is really you know what do the customers see in terms of value of real analytics where they are able to pay for it right is a way this right so it's it's not I mean that was the real question not the independent variable kind of thing he's asking a question specifically to your companies yeah so the question is you're talking about real time and yeah questions specific to the two companies I mean I'm just setting that as a context saying that someone who's offering real time analytics how was that able to drive value yeah so very good question you know so first of all our approach is not to go from solution and find a problem so Mu Sigma doesn't go from a solution of real time analytics to say here are the 10 problems I can I can approach it towards our approaches go to go from problem and what is the right solution so you know we're not technology company or we're not a product led company we're analytics and decision sciences company so when you approach it from that perspective and when you engage with clients and you're figuring out what decisions they're trying to make some clear some fuzzy and some muddy some latent then you figure out from when you start from there then the business case evolves to itself right and then you say now do I have the capability to enable the decision with the right math with the right business context and the right technology so that's our approach towards it and we are finding I think you know this whole real time analytics and you know even earlier the whole big data you know how to and so on and so forth these are enablers eventually the big data hype you know was not there five years back when Mu Sigma started sorry even Mu Sigma started in 2004 and it's slowly going to die its death as well and so it will be the case with real time because it will become part of the hygiene of the organization as a capability enabler so our approach is not to go from the solution so we don't go to a client say have a real time analytics solution for you that's not what we do but that's how Mu Sigma works we don't work from solution to problem we go from problems to solutions yeah I'll give an example in terms of my telecom world today it's about as users are we willing to pay money so if we are ready to pay upfront a significant fee for that benefit that comes so telecom is a good example in India it's already it's actually spending a lot of my time and investing a lot of money in getting real time information from people like us subscribers like us and actually starting to push stuff Airtel is doing stuff and I'm sure other telecom providers are doing it I know from Airtel's example because some of the guys whom I have as colleagues or friends actually are working with the vendors who are supplying software to Airtel's data centers so nothing related to this example but it says it is very very specific and I think as Indian consumers we are ready for it when we start paying for it I think the service providers will start using that technology if I may I have a slightly different take on that right so your question was more around our business is ready for it so this much is clear all is being equal if you are real time and your computer is not you will win right now the question is are you equal in other things and if you are not then you have a very simple decision tree either this can be your advantage and it will eclipse out other things or you are so behind in other things that you need to do them first right and I'm not here to sell snake oil like real time analytics is not a panacea it doesn't solve if you have a crappy product so I mean you really have to fix those things first good points yeah thanks final question I just wanted to make one comment again from an infrastructure perspective when you look at flash economics right so the good news is that the infrastructure costs are going down rapidly even when you think about what Amazon is doing and how it's making infrastructure available at the price points it's making that translates into business models that become viable that were not viable before as well so I think it's pretty amazing what we are delivering as an industry compared to even five years ago final question from the gentleman there Dinesh so I have a question about how do you say real time analytics kind of playing out itself you see it as going more like a verticalized manner there will be adoption and technology slash solutions will come out or you see someone coming out with a more horizontal play across major verticals or all verticals so I think both will happen because there's so much of opportunity to grow as we talked about earlier in the question technology will come through and that will solve it from an horizontal platform angle giving a unified stack making it easy for people to build systems that can coexist with bad systems I think bad systems people are not questioning anymore I think they're saying yes it's accepted we want to do it let's do it people are in different adoption cycles but they're doing it real time at the beginning of the curve I think technology will enable it will come in and there will be people who will come on board so first definitely technology has to enable for the critical mass to adapt it there will be more number of people still not in that space so they'll come when it's a safe technology very very supported technology they need a unified tech stack they need trained staff available all of that is a necessity for some enterprises to make those decisions it will happen now during this process the opportunity just like the way software as a service model disrupted there are people who will come from vertical angles who don't need people to invest in technology or learn about technology nothing I didn't need to know anything it's a hole in the wall it's a service and I make a call and it'll respond to me in real time and it'll give me in response let's say okay they'll guarantee me a 30 second 30 millisecond response time or 50 millisecond response time they'll put something in my data center you know and with the refreshes happening or something like that so they will figure out a way in which it can be solved in that sense so because real time presents with the milliseconds and some milliseconds response time sometimes so specific vertical solutions will come up where technology is nowhere a barrier it'll come so I think both ways we'll see innovations coming up in the next you know 12 to 18 months okay now so thank you all for staying late for this panel on a Saturday evening we truly appreciate your time and your interest in sharing the panel and the discussion around it can I request you to take a moment to give a big round of applause for all the panelists and some of them have traveled to the city to just be here with us and thank you so much for that thanks again very much