This talk will provide motivation for the extensive instrumentation of complex computer systems and make the argument that such systems. This talk will provide practical starting points in Erlang projects and maintain a perspective on the human organization around the computer system. Brian will focus on getting started with instrumentation in a systematic way and follow up with the challenge of interpreting and acting on metrics emitted from a production system in a way which does not overwhelm operators’ ability to effectively control or prioritize faults in the system. He’ll use historical examples and case studies from my work to keep the talk anchored in the practical.
Brian hopes to convince the audience of two things:
* that monitoring and instrumentation is an essential component of any long-lived system and
* that it's not so hard to get started, after all.
He’ll keep a clear-eyed view of what works and is difficult in practice so that the audience can make a reasoned decision after the talk.
This talk would appeal to engineers with long-running production employments, operations folks and Erlangers in general.