 I'll basically present a search at Sumo scale. So how many folks here know about log analytics? Can I have a quick show of hands? All right, good. Okay, pretty good. So a bit about Sumo logic, and I'll try to be very, very close to the agenda. So Sumo logic is a log analytics company. What we do is we take unstructured log data, and then the customers can derive meaningful insights out of it. By meaningful insights, what it really means is while you send unstructured log data to us, log data is like it could be coming out of your MySQL server or your application data. You can actually be running queries on top of it. We have our own grammar. You could actually construct a SQL query on top and be able to get insights out of the log data. So that's what log analytics is about. What I'm going to present today is I want to talk about search in our context. Then I want to go and talk about a thousand feet overview of our search architecture, and then a couple of things that we did to scale our search architecture. And then I'll also talk about some of the future initiatives that we're taking to scale all of this. So trying to stick very close to the problems that we fondled and we figured out, learn the hard way. And this is nothing to rocket science. I mean, if you go through some of these distributed system architectures, you'll find it there. But in our context, how did it work and what did we do to solve some of these problems is what would be interesting. So some learnings to be taken from there. So, and I'm a noob. I run search engineering for Sumo Logic. OK. All right. So the marketing made sure that I put this first slide there. I'll skip this one. It contains some data points around the kind of scale that we have. But I mean, just to build some sort of analogy, does it click? So it gives to build some analogy. So the average peak record for the number of tweets per second was sometime during the World Cup. And this was 9,000 tweets per second. Do you know which match that was? Somebody can guess? What is that? The World Cup, not the soccer World Cup. Soccer World Cup. Which match that was? Brazil, Germany, exactly. So this was when the fifth goal was stock. And then they top, they maxed 9,000 tweets a second. And the kind of, it's sort of a tongue-in-cheek sort of comparison. And I want to step backwards and not make an actual comparison of what Twitter does and what we do. Because Twitter probably does more with the log data. But just to give you a sense, if a customer, like in most of you, would be generating in the range of terabyte of log data every day. And if you were to treat every log line as a tweet, you would probably be looking at an average tweet of 25K logs per second. And then this is just one customer. We run 500-plus customers on our infrastructure. So that's the kind of scale. And again, it's not an apples-to-apples comparison. Twitter probably does more with the tweets. They insert ads. We would want to love to insert ads at some point in our log lines. But who knows? So the first thing I said was I'll talk about search in SumoLogic context. So everyone is familiar with a web search. You type in a keyword, something comes back. And that is the thing that comes back is relevant to your search, the keyword that you put. So here is a typical example of a log line. So this has a timestamp to it. This is a timestamp. There is a logging level. It has some keyword, which comes from my context. And then it says, this is a typical application log line. It says, user with ID, some random Higgs number logged in. Now, what I want to do is, as a customer, and this is a customer log coming and being sent out to SumoLogic. So what I want to do is, essentially, I parse this information. I am basically looking for all the log lines which has, let's say, host ID is equal to prot search 2. And then I say, parse user with ID star logged in as user ID. So I'm giving some schema to my unstructured logs. So I generate the user ID. And then I do a time slice one hour, basically bucket all the logs in one hour time range, and then find out the distinct number of user IDs that are logging into the system. So that is a typical search use case. So you could do things like, person ties, main max average, you could be joining log lines. You could do things like sessionize, which would mean that find similar log lines and group them together. So those are some of the things that you could do with our query language. So this is a custom query language. Taking to the next step is what we want to talk about was a 1,000-feet overview of our search architecture. So part of the 1,000-feet overview is basically the ingestion pipeline. So we are a cloud-native log analytics company. So all of it is hosted in AWS. The customers deploy their local collectors. There's an agent running on your box, on your servers. And the data goes to our ingestion pipeline. We create inverted indexes out of it, an inverted index for people who don't know what inverted indexes are. So you have log lines. Let's say user with ID star logged in, user with ID x logged in. So what we do is we extract terms out of those log lines. And then there is a term to a document mapping. So that's why you call it inverted. So you find out common terms from all the log lines, and then do a term to document mapping. So that's what you generate. And then you put that in storage. And then there is when a user comes in on a query via user interface or via API, it basically goes and hits a query processor. It's a cluster of servers, which is the only responsibility of this node, is to create an abstract syntax tree of that query language. So the query that I had put in there, I created a pipeline out of it. I generated a plan that plumbed the plan. And then on top of that query is a search. So what I want to do to plumb this through is get these log lines matching my search. So host ID is equal to prodsearch2 was my keyword. So I look at that keyword, and I run a distributed search. So that's what happens on the index serving. So the index are generated on the ingestion pipeline, and the index serving happens here. What we also have is what we call as neo-real time. So the problem with this whole machinery is that it takes a while before the data hits this machinery and then gets flushed to the index serving before it becomes searchable. What we have is a neo-real time system, which basically serves as the last five to 10 minutes of data. So this is basically directly feeds off the ingestion pipeline, doesn't go the index generation route. So for the last five minutes, you basically go and ask the neo-real time system for your logs. And then you basically go hit the index serving for the rest of the logs. The next thing is, I'll talk about challenges. So the first challenge is that a typical production cluster for us, a number of index serving nodes would be 1,000 plus. So you don't want all your customers to be sitting together for obvious reasons. The one is that there'll be a lot of noise. It is not monitorable. You don't know whether a customer which is putting excessive load is causing the entire index serving to go down and all of that. So what you do first is do some sort of isolation. So the first level of isolation you could do is customer isolation. So you group similar customers together, and then what you do is you port it. So 1,000 nodes, let's say we have 100 customers, break that, the entire port into 10 servers each, and then put 10 customers each. So that's the first thing that you do. And then so what you have here is basically multiple customers sitting on a single pod, and then these are the number of machines on that pod. So we have still not solved the complete problem of isolation. A typical query, so what could cause the problem in our use cases that a typical query has a query to it, which is a language to it, with search and whatever pipes the customer has built, the user has built. And then it has a time range. Query, the logs from time t1 to t2. Now a customer can, the expense or the cost of a query depends on the time range you're querying. The amount of data the customer has, because more the data that you would have to scan, the more is the load on the system. The kind of operators that you're using. There are expensive operators. There are cheap operators and all of that. So the other problem that we had to solve was, even with some degree of isolation, I can still have a case where in my customer c1 is overwhelming the customer c2. It is just putting excessive load that my entire pod is down. So how do you solve such a problem wherein in a multi-tenant system you want some sort of isolation? So what we did was we implemented a fair share mechanism, which is nothing but a time-sharing system. So it's as simple as a time-sharing system wherein you ensure that the time-sharing that happens across customers could be measured and you could actually redeem. So in this case, what will happen is that the customer c1, though it is putting in load, only customer c1 will get impacted, and customer c2 will not be impacted if you were to implement a time-sharing system. So that is what is query isolation. Workload prioritization. So the problem that we had here was, and one of the basic principles when you build a system which does a lot of the heavy lifting is that you would want to give more cycles to your foreground workload and give less cycles to your background workload. So in our case, there are foreground workload and there is a background workload. So what we do is we give more CPU cycles to our foreground workload. So we know which is our foreground workload, we know which is our background workload. And then just simply give more CPU cycles, more compute cycles to our foreground workload. The other problem that you also have is that, in a typical case, is that the customers are usually looking at the recent data more often than not. So the other way to prioritize it is that on a time dimension, so give more compute to the one query queries and give less compute to the older queries. And then that basically gives more bank for buck for the customer because the customer is not, so the customer essentially in a sasshole is not really worried about how do you tune the queries and how do I make sure the system runs as fast as possible. So you can do some of these analysis just to figure out what is your foreground workload, what is the prioritization model that you want to go with. And then come up with a model. So this one final problem that I want to talk to you about was stragglers. So we run thousands of servers on AWS. And one thing that AWS doesn't guarantee is the reliability of the service. So your machine can be going down. There could be steel time. There could be a network partition for all you would know. So there are two problems to be solved here. One is that, so how would the search work is that the query processing, the query is running here. And then the search is distributed. The keyword search is distributed across all the nodes. So while the query is in flux, there is a node that goes off for various reasons. Now you want to solve this problem so that all the workload is not hit. So there are two sort of workload. One is the workload which is going to be generated in the future, and the current workload which is happening. So you don't want the performance to be hit because of this one node going bad. So what we implemented is a standard thing which we call as a straggler. So the first thing that you have to do is detect a straggler. So based on the workload and figuring out what are the response times of all the servers, you could actually do a Gaussian-based model, and you could figure out what is the health of the server because of the response time that I'm seeing on those servers. And once you're able to do that, you can figure out that the server is in a bad state. Once you figure out the server is in bad state, then what you do is that not only you send the server, the request to this server, you also raise that same request to a standby cluster. In that way, whichever comes back, you use that response, and then you cancel the request on the previous server. And the reason why I said we do a bell curve is that you also want to give some cool-off period so that the server might come back, and it is just off because of the extra load that it is taking at that point. So those are some of the techniques that we use for stragglers, to detect stragglers, and then to remediate stragglers. So I talked about this is some of the future stuff that we're doing. We're working on, we already have this in beta, which is a near-real-time system wherein we wanted to reduce the drop the latencies of our searches under 10 seconds. The current latencies look like about two minutes. From the time the logs hit our server to the time the log is searchable. And then the other thing that we're working on is between the index server and the query processor, it's an HTTP-based request-response model. So what we're wanting to do is basically using reactive stream, streaming between the query processing and the index serving and build a data pipeline on top. So that would basically improve the utilization of the index serving machine. So this basically is to improve the utilization. And then a constant challenge for us is how do you defragment index and create better indexes so that the searches are faster? And a tidbit about our stack. Our stack is a complete scalar. We use scene with optimizations. And we have our own map-reduced stack. And the infrastructure, the search infrastructure runs on 4,000 plus AWS instances. That's it. That's all I have. Question and answer. Thank you. We parse multi-line logs. Yeah. Hey. You spoke about giving the front-end more of the CPU cycles and the back-end less and time-sharing. Did you think about auto-scaling when the load comes in, scaling more nodes? So the challenge with us is a challenge with any search infrastructure is that it is a data-centric application. So the data is extremely heavy. There is a cost of moving the data around. So you always want to keep compute along with the data. So we do have an auto-scaling machinery, but it takes a while to that auto-scaling machinery to come online for the reason that it is a data-heavy application. So your data is residing on S3, I assume? Yeah, the data is residing on S3, but there is a local cache. It is extremely expensive to miss the cache hit on the index-serving nodes. Exactly. And you also spoke about stragglers. Stragglers, that's right. Have you thought about the speculative execution is already implemented, right, and Hadoop and everything provide speculative execution where you can? So one thing that we do is that we have our own MapReduce stack. We don't use Hadoop. Yeah. So a lot of these interesting things that Hadoop moves with, we have to build on our own. Any particular reasons for choosing your own MapReduce stack? Yes, because so we wanted, so as a real-time use case, it's not a batch processing use case, and Hadoop is not as responsive in a real-time use case. Just out of curiosity, did you try Spark? Did we what? Try Spark. Spark is for streaming use cases. So Spark and Storm are mostly for streaming use cases. We did try Spark at some point, but it didn't yield. So ours is a very minimalistic MapReduce stack, the way we want to implement it, and leverage is a lot of these nuances that we want to handle. So it basically fits in our model much better. So what's the data storage you use? Data storage on the, so index serving is all disk-based data storage. OK. And it has a, the source of truth is basically S3. OK. So I just wanted to know why did you go with a multi-tenant system? So like you could also have moved the logic to the infrastructure, like something, like a container for each and every customer, or something like that. But you chose to go with the multi-tenant system. Is there a specific reason for that? Part of it is we designed this back in 2010. And the other thing is there hasn't been any fundamental. I see what you're trying to say. So we are evaluating the container system at this point, wherein we could do more of these auto-scaling and interesting stuff out of containers. So we are definitely evaluating, but our architecture is, at least on the search side, is slightly older, where a lot of these interesting things like Docker didn't exist at that point. OK. Thanks. Sure. That would be the last question. I presume you're using Lucene, your customized Lucene for that? Yes. Any particular reason in not using Sola, Sola, Crowd, or Elasticsearch? Because we insanely control how search runs. Ours is, as I said, we have 500-plus customers, and running on Sola doesn't scale, because we do a lot of optimizations on top. What about Elasticsearch? Now, because we run our own MapReduce stack on top of index serving, so they don't provide us any of those capabilities. Customizability is not there. What's that? Antelano. Now, we have our own parser. It is not an Antelaw-based grammar. Thank you, Anoop. Thank you.