 from Las Vegas, it's theCUBE, covering AWS re-invent 2018. Brought to you by Amazon Web Services, Intel, and their ecosystem partners. Oh, welcome back here at the Sands as we conclude our coverage here of day one of AWS re-invent. We've been live on theCUBE. We'll be back with you again on Wednesday and Thursday. I'm glad you're here with us on Tuesday for our coverage, along with Justin Warren. I'm John Walls, and we're joined now to executives from SignalFX. Karthik Rao, who's the CEO, and Aarajit Mukherjee, who is the CTO at SignalFX. Gentlemen, thank you for being with us. Yeah, it's a pleasure to being on. All right, so just tell us a little bit about what you do and why you're here, and then we'll dive in from there, if you would. Sure, SignalFX is a cloud monitoring service designed for operators of applications and infrastructure that might be running in the cloud. Our origins came out of Facebook, so Aarajit and much of our technical team were responsible for building the monitoring systems at Facebook back in the mid-2000s when they had their famous move fast and break things culture, which today everyone calls DevOps. So what we've really focused on is building a far more analytics-centric monitoring approach that focuses a lot on identifying the patterns that are really meaningful, and we believe that's a far more important problem to solve in today's distributed environments. And you made some news not too long ago. You've unleashed a new product into the marketplace. Aarajit, if you would. Yes, yes, we are very excited to launch our what we call a SignalFX microservices APM product. And it's really aimed at giving customers visibility into the transaction flow that's happening in their microservices environment. As you know, like we are moving to microservices, the individual pieces are becoming smaller and they're growing in number. And so the complexity of those interactions becoming harder and harder to manage. And this product is aimed basically to help our customers make sense of those and monitor them effectively. Okay, a theme that's come up a couple of times today on theCUBE is that the complexity of the modern way of doing things in cloud native services, microservices, it's beyond human comprehension. You need to have the assistance of tools like APM. And I think we were talking just before we went live that this is a distributed tracing type approach to microservices. Is that correct? That is correct. So the goal is to have a lightweight approach where you can very easily generate spans and traces from all your microservices to the whole environment. And the value that we provide is sort of to take them, baseline them to give you a sense of how performance is happening overall in the environment. But more importantly to your point earlier is how can we help the customer using data signs to help them guide them towards the problem when it is happening, where it is happening so that you can reduce the MTTR, which is sort of the key part of all of this. So that's been the much of the focus of the product. Yes. So for customers who are looking to re-platform on a microservices or some of these newer ways of doing things, what is it about signal effects that helps them to understand how to change an application from one way of doing things, a monolithic type application into something more microservices driven? How does signal effects actually help them with that journey? Well, our customers who are early in the cloud journey are doing a number of things. One, they're able to get complete visibility into the old, right? So you typically want to look at side by side. So you're able to leverage it in our smart agent, collect information about your monolithic stack, get full visibility into what the performance looks like in that particular environment. But then what we do better than anyone else is give you comprehensive visibility into the new stack and give you the analytics that will allow you to really compare one versus the other. So one of the things that's very different about signal effects is we have a very rich analytics capability in the backend, so collecting metric data across your environment, whether it's your old stack or your new stack, we're able to provide very sophisticated analytics to identify meaningful patterns, outliers, anomalies, and to look across all of your metadata to be able to identify whether those patterns are specific to a subset of machines or a particular version of code. And that's typically very helpful to customers as they're moving from the old to the new. Yeah, can you give me an example then? I mean, in terms of the specificity that you've provided, you talk about sophisticated measurements or stats, just something that would tell us, oh, I see that was kind of an aha moment maybe for one of your customers. Yeah, so the thing that is unique about us is that because we have a strong metrics product that's backing this, because we have a strong analytics capability that's backing this, when we do distributed tracing, we are tracing and providing you insight not only into your application, like what it is doing, we are actually able to correlate that with also with your infrastructure. So let's say your application is running in a container, if there is a problem, we can actually let you correlate that application and the performance of that to the container, to the host, to the infrastructure, top down as well as sort of left, right, so to speak. And that has been sort of key because what we find is having that capability really helps sort of short circuit the resolution time because a lot of times the problem may be vertical, other times it may be broad, like horizontal, right? So our goal is to catch both of them. Okay, so you're able to identify the root cause of an issue much, much quicker, so teams can go and find that server's failing, I need to go and replace a piece of hardware or there's a storage issue and you can just dial it straight in really quickly, is that- Yeah, so just to, you know, in modern environments, you're far more likely to see performance issues in a small subset of your transactions than you are to see just a massive outage, right? A lot of modern distributed systems are designed to be resilient to individual node failures. For example, in a future that we just launched along with our microservices, APM is something called Outlier Analyzer. So let's say all your metrics indicate the service is performing fine, but you have 0.5% of your users complaining that the performance is terrible. That's where tracing really helps because now you can look at every transaction, you can understand exactly where, you know, things might be slow, but it's typically a trial and error process. You have to go through every single trace, you have to sort of figure out, is it a particular version of code, a particular server? Our Outlier Analyzer feature will automatically look through all of the outliers, identify the over-represented dimensions, and guide you to those specific problematic areas, right? So you run our Outlier Analyzer, it'll tell you, you know, this particular machine is over-represented in your long-tail traces, or this particular version of code is over-represented. So it short-circuits the entire troubleshooting process by orders of magnitude. Yeah, that kind of intermittent error is always really, really hard to find. Something which just explodes and catches on fire, that's easy to find. And it's extremely difficult for a human to find it by trial and error across the distributed system that could involve thousands of components, right? So you really, really have to leverage analytics, and that's really what SignalFX is incredibly strong at doing. Yeah, so we're basically replacing luck with tools. Trial and error and luck with a more prescriptive troubleshooting. Yeah, so for customers who've gone through this journey and they've actually re-platformed an application, they've converted it into microservices and they're doing cloud-native things. And you've helped some of these customers. What's an example of a customer who's living in that new world? What's the view like from where they're sitting where they have all these lovely tools and they're not relying on luck anymore? What's their sort of daily life like? Well, I think the biggest difference is they are now able to automate a lot of remediation. If you can be more intelligent in the signals that you're capturing, apply more intelligent analytics, then you, especially in today's environments, you can automate a lot of remediation, today's frameworks are highly automatable. And so one example of this, we have one of our larger Fortune 500 accounts. They do a number of launches, product launches, where they get massive amounts of load during a product launch. And this is not atypical in today's environments. And prior to having the real-time data collection and analysis with SignalFX, they would have two rooms full of people supporting every single launch and very reactively, and something would go wrong, they would have to go and figure out what was happening. But SignalFX are now able to build very sophisticated analyses on the data as they're spinning up containers and instances to support a shoe launch. And they've now actually automated a lot of their mediation, whether it's auto-scaling or rolling back of canary releases and such. And they've gone from having two rooms full of people to having just one on-call engineer every time they do a launch. And it's also enabled them to be a lot more aggressive in doing these launches because they just have a lot more confidence in their ability to execute them. Got it. That's one example. Justin was talking about some of the trends we've heard a lot about today. The one, I guess, or one of the constants has been about the pace, the rapidity of innovation, the rapidity of change. And so in your world, what do you think is the next? It's a big thing, but what mountain are you trying to climb now that you haven't already conquered? So in my view, there's some very, very encouraging trends that are coming our way. Actually, there was a talk that I presented earlier today about the concept of service measures and how I feel that they're going to be the next big thing because I think they attack a lot of the core operational challenges that we face in our microservices environment, including how well you can instrument your environment, how well do the different types of instrumentation, your metrics, your APM, your logs, like how well are they relatable? How well tightly coupled are they? How quickly can you make configuration changes within the environment in a more foolproof manner that's more automated, that is more consistent? And so I feel like technology like that is going to transform how we do software in a few years from now. I've seen that advancing very, very quickly. And something that's very related to that and something I alluded to the talk earlier too was this concept of feedback-driven automation where now I am not going no longer going and just configuring my infrastructure to behave the way I want it to. In fact, I'm also observing it as it is running using high-quality monitoring tools like SignalFX and then using that to create new feedback because if things sort of diverge from my intent that I should be able to get it back to where I want to be. And all of this must happen without human interaction because we work in the order of minutes while automation can do this in seconds. This is absolutely fascinating. I think this is one of those big trends that are coming down the pipe. Karthik, anything to add to that? No, I think Garza nailed it. Excellent, all right. Gentlemen, thanks for being with us. Thank you very much. Good luck with the rest of the show. I'm sure it's been very good for you so far and for the next two days, have a great time. Okay, thank you very much. Excellent, thanks for being with us. We are concluding our coverage. Day one here of AWS re-invent for Justin Warren. I'm John Walls. We thank you for watching theCUBE.