Tobias Kuhn, Nakul Selvaraj: Real-Time Monitoring of Distributed Systems





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 29, 2015

Instrumentation has seen explosive adoption on the cloud in recent years. With the rise of micro-services we are now in an era where we measure the most trivial events in our systems. At Trademob, a mobile DSP with upwards of 125k requests per second across +700 instances, we generate and collect millions of  time-series data points. Gaining key insights from this data has proven to be a huge challenge.Outlier and Anomaly detection are two techniques that help us comprehend the behavior of our systems and allow us to take actionable decisions with little or no human intervention. Outlier Detection is the identification of misbehavior across multiple subsystems and/or aggregation layers on a machine level, whereas Anomaly Detection lets us identify issues by detecting deviations against normal behavior on a temporal level. The analysis of these deviations is simplified through the use of a time and memory efficient data structure called a t-digest. With t-digests we are able to store error distributions with high accuracy, especially for extreme quantile values.At Trademob, we developed a Python-based real-time monitoring system to conquer those challenges in order to reduce false positive alerts and increase overall business performance. By correlating a multitude of metrics we can determine system interdependencies, preemptively detect issues and also gain key insights to causality. This session will provide insights into both the system’s architecture and the algorithms used to detect unwanted behaviors.

Tobias Kuhn, Nakul Selvaraj

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...