 Good afternoon everyone, I am Ramesh, Vishnu and myself are from Flipkart supply chain team. We manage the platform side of things on the supply chain management team and we are here to talk about real-time analytics, visualization and complex email processing. So given the large amount of data we process every day in the supply chain management it becomes highly important to think about real-time analytics and make decisions very quickly. It begins with a Varrows manager looking into his Varrows talk and understanding that I am actually going to go out of inventory on a particular product. I will have to buy it quickly so that I go ahead with my future orders. It might also mean that there is a sudden raise of a particular book say of Chaitan Bhagat and you see that the number of orders for a particular book keep increasing and you do not have enough inventory of the same. You start pre-ordering to your suppliers so that you can serve future orders of the same. All these boil down to making instantaneous decisions and that is what we are going to touch upon here. So we started building a system with something like this. Some of you may relate to this with Tom and Jerry's designs on Jerry episode where he builds a really complex system just to drop a save on Jerry. So we started with something like that and the components, these are some of the components which play over there. Starting on the left you see the application servers on the database servers and the guy with the log over there is LogStash, that's Greylog, Elasticsearch, Moonen and Graphite that's command line MySQL and ActiveMQ. So we will touch upon the architecture a bit. Starting with this, these are basics. Application servers talking to any database. In our case let's assume it's MySQL. This scales out to n such instances and when you are actually having applications deployed across hundreds of VMs, it's going to be really difficult to manage your logs to debug to also make any understanding out of what's happening in your system. There are five different components with ten machines in each of its cluster and they want to talk to each other and you will need to coordinate all that. So how do you actually go about debugging if something is going wrong? You need something like this. We started putting in things one at a time and started transmitting logs from our application servers through LogStash into Greylog. Have any of you heard of Greylog here? Okay, I see a few hands, that's good. So it's, you could call it the open source brother of Splunk, it's one of the very esteemed and well performing ones in the open, sorry, open source logging market and Elastic Search is the back end for it. If you look at it, earlier Greylog did not perform primarily due to its usage of MongoDB as the back end but once it started using Elastic Search, things changed. What we started doing with that is from our application logs, we started stripping out details, stripping out fields and making sense of them, creating separate fields through LogStash and saving them in Greylog. Greylog also provides you this wonderful UI where you can do search and search based on a transaction, get all details on a transaction and stuff like that. So this is a basic setup. Then we introduce StatsD into it. So how many of you know StatsD? That's a hands, okay. StatsD is this network demon tool based on NodeJS. What it allows you to do is aggregate metrics and send it across to anything. StatsD by default has a back end for graphite that lets you graph whatever details you put onto it. What we additionally did to it was write a back end to StatsD that writes back to Elastic Search. Why did we do this? We did this primarily to say you have number of orders, number of items per product in my warehouse and these kind of information going through your logs and from your databases. These are real time information from your logs. You understand them, interpret them and graph them. But say six months down the line, you come back and ask what was my number of orders on this Wednesday and what is my number of orders on this Wednesday? I want to compare. We won't be able to do that just with StatsD and Graphite. We want someplace where we can aggregate those information and save it. That's where we started writing back to Elastic Search and we actually open source that particular back end. So StatsD writing to Graphite is the default behavior. So you'll be able to see all the graphs of, as I mentioned, the orders or number of items and stuff like that. So this setup was working pretty well. Then we moved on with this. So you have a guy who makes queries to Elastic Search and is able to understand and aggregate information and act upon it. So Elastic Search, if you haven't heard of it, is the full text-based index database or I should call it no database or alternative to database. So it's built on top of Lucene, similar to Solar, but it's like very recent and growing very well. So what we started doing is extracting information based on a particular rules, starting to put in queries of it and obtain that information and write back to StatsD. Again, the information goes to Graphite and you will be able to interpret and compare data primarily for anomaly detection. Anomaly detection doesn't happen through automated methods here. You will have to see the graphs to understand any anomalies, but we needed an automated way. That's when we thought of putting in a complex event processing engine over there. This is still a work in progress, but we have multiple use cases where this will help us achieve increased productivity. So what is the advantage we gain with this? We have Elastic Search and automated DB. So that gives us queries and faster results, primarily because all of the fields we require, the fields which are interesting to us like the order ID, like the transaction ID, they are all indexed and saved already. You just need to extract information through structured queries and it's a JSON that's given back to you. You can do manipulation with it, you can print it out to HTML, do whatever you can with it. StatsD also has this aggregate of over a sliding time window. So you can mention that, aggregate my details over a minute. I'm not interested in the, so we also started logging our exceptions, the number of defaults, number of errors and stuff like that. So I'm not interested in the number of exceptions that my application throws every second, but I'm interested in the number of errors that my application throws over a minute and I want to see a trend of it. StatsD lets you do that. You can define over what interval you want to aggregate. You can send that over to Graphite. You can also define your own backend. Say you want to write a backend for any complex event processing engine. You get the details and send it back to the complex event processing engine and it takes actions accordingly. What it could then do is, say your number of exceptions are increasing exponentially. You write a rule. You will be able to integrate it with Nagios or Moonen and be able to alert people. So Nagios lets you do this alerting based on SMS, email and mobile calls also. So we started doing that too. So what are the use cases you are talking about here? Say an order is placed and you want to know for every order that is placed and confirm you are raising a purchase order for you to procure that item and then to ship it out to the customer. A purchase order is raised, but due to some anomaly in the system, a sales order is not raised. This is a problem, both in case of your details and also in the case of auditing. So these kinds of anomalies should be detected at the right beginning and should be taken care of. That's one. Second thing, you want to be able to be proactive. So a warehouse manager would be interested in knowing my current inventory and what are the products that I am going out of stock of. So say a shelf in a warehouse is assigned for a particular product and all my say that product is a mobile phone. All my mobile phones of say Samsung sit in this shelf or this shelf and if I notice that one of my shelves is going empty or the quantity in this zero, that's an alarm. I need to either procure more items of it or reassign that shelf to someone else. Where is the current pile up happening? There are multiple systems in a supply chain management. If you think about there is the website which gives you the which is the interface for the customer to place orders. Then there is actually an order management system which manages the orders, sends it across to then the procurement system. There is the procurement system which actually validates the order, procures it, gets it back, puts it in the warehouse. So effectively there is a warehouse system. There is also the shipping system which actually takes your order, ships it out to the customer, gets the cash back and all this. So there is a complete order flow the moment a person says, check out my order. But starting from there you want to know how this order has transmitted through your system. You also want to understand if there is any pile up happening. By pile up I mean say the customers are placing orders and there are say 100 orders are placed. There are 100 POs raised. But there are only 10 sales orders that are received or there are only 10 shipments that of this particular item that are shipped out. There is a bottleneck between my procurement and my VAROs. I need to understand and fix it. It could be due to staffing issues. It could be due to some other delays, supplier not giving information, anything of that sort. But the main point is we and also the people managing the VAROs and the supplies need to understand what is going wrong. Again SLA being breached. So every order has an SLA. Say you order a book on Flipkart. Others told that you will probably get this book in one day or two days. We effectively tried to meet the best SLA. If we had promised you two days we try and deliver it within a day. And if we see that it is not going to be reaching the customer by two days we have to take corrective actions for it. Say Bangalore warehouse has the book but is not able to ship it in two days. But Delhi warehouse does not have the book but will be able to ship it in two days. We actually procure a new book in Delhi warehouse and ship it out. These kind of anomalies and these kind of SLA breaches need to be understood and they need to be acted upon. We also started using our platform primarily for understanding our systems behavior. So we see the request times of each of our controllers or each of the sub URIs in our system. And you have the average, minimum and all that. So these are good when you are actually debugging or doing performance tests on your system. You also have, that is not very visible, sorry. But you also have how much time each of my queries took, how much milliseconds can I improve on any of those. So we have those information but how do you make sense of it? As the number of records or the size of your database keeps increasing you want to understand if your actual system performance is going down. So we started pushing out all those information through StanceD to Graphite so that even our developers while doing the development understand what's going wrong or is there something they need to do to do better. This is an interface of Greylock. We just have a few snapshots for you. So this is a sample of a particular log. So I'm searching based on a particular transaction ID. If you notice here the transaction ID is all the same. So I'll be able to trace the flow of an order across my system. I know that the inventory was, the order was created first. The cash on delivery verification was done. Order was approved, PO raised, inventory received. The next step is for the warehouse to ship it out. We're just waiting on it. Once that is done we'll know that this order, this transaction has completed its life cycle and is good. If something in the middle is broken, say the order was never approved but a PO was raised, we'll also be able to understand it. So a UI is good, Greylock is superb, but nothing beats the command line. It's the only thing that lets you scale. So this is what you see when you actually tail logs in a particular system. But what do you do when you have logs sitting in 100 different systems? You can't SSH into each of them and tail your logs. Or you can't have 100 monitors around you. So what we ended up doing was write a command line tool for it. We query elastic search based on our command line tool and I'll be able to do most of what Greylock provides us on the UI and even more. So what we are actually proud of is this. We'll just be able to tail our logs across 100 different systems. Just like that. You say flow logs tail, you just see the logs from all of your 100 systems. You'll also be able to search based on, I want the logs only from this particular system or only this particular cluster or the machine's having this particular tag, anything of that sort. This is an example of a verbose output. So each of these you see the level, the host, the facility. All of these are fields in elastic search which are actually querying upon and getting details. So this is more explanative than what I can do in my Greylock way. Sure. So I'll let Vishnu talk about the remaining stuff. So Ramesh had explained the particular flow from the application logs and how we use log stash and Greylock to actually an elastic search and graphite to get a better visualization and understand our logs and detect anomalies. The logs are one source of truth, one source of events from which you can analyze your data but sometimes you don't have to trust your logs because some people might forget the log data. So the database has been sitting lonely over here. So let's use the database and we plan to do that. Our database is also a key source of information, right? We store our data there, updates happen, inserts happen. So that's an actual real time source of truth. Something's getting updated, you'll get to know in the database. So the guys at Oracle came out with a library called Change Data Capture. It helps you read your bin log, which is the log for my SQL, which captures all the replication events, like updates, deletes. So we actually use that particular library to read a particular bin log and capture the updates and deletes and inserts, possibly truncate to send an SMS to the sysadmin saying that the table has been truncated. So we actually use the Change Data Capture library to read the bin log, capture the updates and inserts on a particular table or multiple table and also send them to elastic search. So in the context of an order stable, an update happens to the order. Update order, set the status to cancel. That information, someone might not log, someone might miss out that logging. So we can actually use the database to capture such events. That's how we use the Change Data Capture library. Apart from that, we've added one more layer. We actually worked on pushing the same events from Change Data Capture, the dotted line to an MQ. We get all our replication events in MQ, aggregate them, and we actually send them across to a particular slave. So the main motivation for that was to speed up replication. As of now, my SQL uses single-thread replication. So we thought of getting the events ourselves, obfuscating to MQ, and then sending it out to a slave for batch processing or batch replication. This is still work in progress, but we thought of actually doing that. And also there's another fork of my SQL called Drizzle. Drizzle has a plugin which actually helps you send out information to rabbit MQ or zero MQ, which you can use for replication yourself. So you can use that plugin to actually replicate to another my SQL instead of using the native my SQL replication. You can batch your replication events and send them up to a slave. So that was the change data capture events to MQ. Now there's another big problem. Sometimes a big bad query comes in and it kills the database. You will not be able to get that particular query unless you put it log it. So what do we do? We enable the general query log, but let's take a system where you're logging like 5,000 queries per second or 10,000 queries per second. The sys admins are kind of wary of that because disk space will keep reducing. And you might not have enough disk space because you're logging at a very high rate. So what you can actually do is, you see the line which actually goes out and you see log.cc, we actually went into the source code and kind of intercepted every query and sent it out to an MQ. So the strategy was to actually divide and conquer. You have, let's say, 100 gig general log file. So let's try and divide this file up into smaller chunks and distributed on multiple machines. So we send whatever queries we get in MySQL to an MQ and put an army of subscribers. So what actually happens is all these queries get distributed among those subscribers and you still log out. And since the log is distributed, your space consumption is also less. And you don't have to worry about it because the actual database is not touched. The disk is used for the database. So that was one motivation for pushing events to MQ. As far as I said, Drizzle has that plugin to actually send to an MQ. MySQL doesn't. So we actually went in and kind of intercepted each query. And that's the class name, log.cc, the file name, which you can go and actually check how your events are coming. So as Ramesh was also pointing out Elastic Search, we actually created an index called Elephant over there for the conference and actually sent out events to that. So I'll be just showing you a snapshot. So we have actually an orders. On this particular index, we have a table name called orders. And we have actually captured inserts which have been sent to that particular index on Elastic Search. So actually, you can go ahead and query Elastic Search and you'll get the total number of inserts on the order table or the total number of updates. So this is something which I wanted to present. So according to me, every software has a soul. So you often see heroes and heroines singing, trying their heart out. So we thought, OK, MySQL can also sing. So we have a small presentation for you. We'll actually send a query to MySQL and MySQL plays some music for you. So I'm going to say select staff models. So what's actually happening is we have sent a query to MySQL. We have an interceptor in the log.cc file, which is sending out a message to MQ. And we have a subscriber actually playing music. So imagine this is just for fun. MySQL does have a mood. So if it's under a lot of pressure, it will start crying and producing. So depending on your query, if it's a big bad query, you can play a very sad note and let people know that you are treating MySQL very badly. So that was what we had done with this. Just a second before you start your questions. I have a small announcement. There is a gray, sunny Nissan car blocking another car. The lady needs to pull out her car immediately to pick up her kids. So if that car belongs to you, please pull it out right now. Questions? LogStash can directly log into Elasticsearch. Why do you need to go to Elasticsearch? Yes, LogStash can actually log into Elasticsearch. But the problem with LogStash is that once you actually go scale, the time that takes for the indexes to return the proper query is large. What we are using LogStash now is as a transfer mechanism. The transfer mechanism, the beauty about LogStash is that you can provide multiple inputs to it. And you can do multiple different things with it, like a grep, or a grog, or a stdn, stdout. And that's the primary advantage of LogStash we are using. And this Elasticsearch, how old did it do you keep? Sorry? In Elasticsearch, how old did it and did it do you keep? How many days? The logs which keep flowing in daily, we try to delete it every 30 days. So we don't try to keep all the data in Elasticsearch after the particular orders lifecycle has gone through. So it's a distributed instance of Elasticsearch or it's a single? Sorry? The Elasticsearch has been distributed over multiple boxes? So right now we are using it over two boxes, over particular five shards, the default which GreLog provides. Does your application directly logs into StatsD or does it pull it from Elasticsearch? So there are two types of things we use. So StatsD can accept any information over UDP, but it needs to be in a particular format. You also need the log to be available on a machine for audit purposes. So what we effectively do is have the logs available on the machine and then use STD in and STD out to send it through LogStash. This is more of an add-on, but the logs the application make are primarily required for audit and they'll be saved forever. So you need that. What is real-time data verizing here actually, oh yeah? I can't see. Yeah. So what is real-time data verizing? Like we said, according to this subject area, oh yeah. I'm not able to. What is real-time verizing? Yeah, you can say like, according to this one, it's real-time analytics over here. It's a real-time analytics. So what is real-time analytics over here? What is real-time analytics over here? Yeah. By which you mean? According to this session over here, it's like build your own real-time analytics and visualization. So what is real-time over here? So what we do is real-time understanding of the order's lifecycle. The visualization of the number of orders you have and important decisions based on that. That's what we meant. What do you use to do your complex event processing? And what is your EPS that you're hitting in this architecture? We've started working on the complex event processing very recently. Right now, we're using Esper for it. Okay, right. And why do you need to do a strd in and strd out when you can just use syslog ng to push your logs to your log stash? Right. You can do that. The advantage with log stash is that you will be able to strip out particular fields based on drag x. So you're basically using GROC for that? Sorry? You're basically using GROC for that to remove your filters and the fields that you don't want, basically. No, I didn't understand the question. So what I understand by using log stash is to remove the unwanted fields in your access log. And why you're using strd in strd out is to keep the copy of your logs on your application server. Yes, that's true. The advantage with log stash I was talking about is you'll be able to add more fields to gray log. Gray log understands this gel format. It's called the gray log extended logging format. So what you can do is add additional fields to it apart from the fields which are already passed. So fields like transaction ID, order ID, product ID, stuff like that which are existing in your logs but the gray log doesn't understand access. You will be able to strip them out, understand them and send them out as a separate field. And the data that we're talking about over here is all numerical data. Is it right? The data that you're from elastic search pushing to stats t, so graphite obviously takes all the numerical data. So you're pushing all your numerical data to graphite for your trending purpose and then pulling that data back and pushing it to elastic search for aggregation purposes. The search component you're talking about effectively does queries based on rules you define makes calls to elastic search, gets details, sends it out to stats t. So stats t, transmit the information back to graphite. Last question. Hello. Yeah, see you said the stats t is used for sliding into queries and aggregates but the same thing can be done via the CEP engine Esper. So why would you actually have both an Esper and also output to a format where graphite understands? Yes, Esper can do that but not all of our details are going through Esper. So stats t, what happens is all our logs and details go through it and we can define the amount of flush interval on each case. Esper we only use it for particular cases like the use cases I mentioned, the warehouse details, the order details, not all of our details go through Esper. But technically Esper could do the similar thing what stats t is doing currently because I mean you are sliding window queries as well as your aggregates could be actually handled by... That's probably true. I'm not completely aware of those details. You're saying Esper could be used to transmit data to graphite as well. Yeah, see Esper is your run query, standing query on continuous thing and then you can output it in a metric time frame format which anyway graphite understands. So use Java to just transmit over UDP to graphite. Right, right. Perfect, yes. We'll be hanging around if you have any questions. Yeah, thanks.