 Hey, folks. It's not quite good morning yet, so I guess good afternoon. I'm still stuck on San Francisco time. I've had a lot of coffee, though, in some sleep, so I'm a little jittery and I have to pee a lot, but I'm mostly awake. So who am I? I'm a back-end engineer for Spotify. I am based in San Francisco, as I said. I am also vice chair of the PSF along with Naomi Cedar. Side note, if you are a PSF member, quick announcement, there is a PSF members meeting Thursday at 6 p.m. Forget the room, but it's on the online schedule. And if you're interested in becoming a PSF member, you should come talk to me. Last bit, I'm also the founder of Pie Ladies of San Francisco and one of the main lead organizers of the global organization. And thank you. Since I have the stage, who here has actually heard of Pie Ladies? Maybe it's better to go, who has not heard of Pie Ladies? Okay, those hands will be down at the end of this paragraph. So Pie Ladies is a mentorship group for women in the Python community. It's open to women and friends, so it opens to everyone. Essentially, we're a loose group of meet-ups in various locations. One on every continent except Antarctica, which is my new mission to go to Antarctica and start a Pie Ladies. But what we do is we host Python workshops, speaker events, hack nights, coffee meet-ups, everything around Python and learning Python and development in general. And we welcome all experience levels. So I think there's one here in Spain, I think based in Barcelona, that I highly encourage you to check out. I also have a talk, another one, on Thursday afternoon at 2.30, I figure which room, I think the Education Summit Room. And that talk will be about Pie Ladies and more in-depth talk about what we're doing and how the work has actually been going and the effects that we've been seeing and the work that we still have to do. So, who here has not heard of Pie Ladies? Come on, I see two hands. You need more coffee like me? All right, so I gave my spiel. All right, so this talk, I will first give a quick introduction to Spotify over an overview of how we use data. I'll go into sort of how we use metrics and how we came about to implement them, the agile way and essentially what was learned along the way when my team implemented them, sort of the bigger picture. And you can sit back. I'll give a link at the end of this presentation with the blog post and the slides. So you can just watch me rather than your computer or tablet or whatever. All right, so what I basically want you to take away is metrics and tracking is super fun, but should you track all of it? Everything that moves. And we, as developers, we track everything like website visitors, referrals, how folks use our services, if our servers are even up. We have a lot of tools at our disposal like New Relic, Graphite, Google Analytics, Sentry, PagerDuty, a bunch of other things. We even track ourselves like steps and sleeping patterns, exercise, calories consumed, breathing, hair growth rate. I don't know. Like everything that you can, we track, right? Maybe hopes to get some insights or just to feel better about ourselves. But if, should you measure everything? And it's very easy to get lost in that, in the forest, right? It's easy to lose the meeting and easy to lose the understanding of why you're measuring it in the first place. So to start, some background information about Spotify. So we're just all on the same page and also how we use data. So Spotify, streaming music service. We've updated our logo. This is the not the correct green, but I haven't gotten a new shirt yet. So we beta launched in 2007 and I think we came to Spain and some other parts of Europe in 2008. And in 2011, we came to the US. We were in about 58 countries. We have over 20 million paid subscribers and 75 million active, monthly active users. We have over 30 million unique songs, not including compilation albums and such. And we add about 20,000 songs a day. We also pay about 70 to 80% of our income to rights holders, totaling about $3 billion to date. While I work in a very small office in San Francisco with about five other developers, our main engineering offices are in Stockholm and in New York with a lot of data and machine learning in Boston. So as you can imagine, that Spotify data is quite important. These numbers that you see are only about a month old and I have to check every time I do a presentation like this because they are always growing and growing fast. So we track user data like signups and logins, activity within the application itself, even tweets like the good and the bad and the ugly. We also track server-generated data, including requests to various services, response times, response codes, among a million other things. Each squad owns what they want to collect, how and when, and how they will consume such of data. And we have analysts that run thousands of Hadoop jobs a day to glean insight from user activity, answering questions like how many paid subscribers do we have at this moment in time? Or was this partnership financially beneficial to us? Engineers behind the platform watch usage rates to like our web APIs, we watch login failures, feature usage, et cetera. We also have data scientists and machine learning, analyzing listening behavior, music, metadata, and trends that power the recommendations behind our features. Teams have actually started to analyze actual like audio signals and the sound of a song to pick up genres, beats per minute or whatever, and instruments played. It's actually quite difficult to pick up a few things from the audio signal itself, like the mood, how do you classify and define a mood of a song? But that's the stuff that we're sort of doing at Spotify. And this only scratches the surface of what we collect and what we pay attention to. We do use various technologies related to data, including Hadoop as well as Cassandra, Postgres, and Elasticsearch. All of our user data sits in Hadoop, which we run jobs against using our own Python library, or Crunch, Scalding, or Hive. We also use Spark, Tes, and Flink. I've heard actually a lot of people use IPython with scikit learn and pandas. And I've also discovered recently that we have our own IPython notebook server set up, so that's pretty cool. On the back side, some of our service activity actually gets parked in Elasticsearch, where we have Kibana set up. But the majority of service activity is actually in a homegrown system, of course. But we have open sourced it. It's called Fast Forward, or FFWD, and it's written in Ruby. Sorry. Yet, with all this setup, with all this technology, I'm really embarrassed to say that my team did a lot of development in the dark. We were not tracking anything. We did not know how successful our future integrations were. We had no clue how our back-end services that we maintained, how they were holding up. I do want to make note that a lot of squads in Spotify do track a lot of data to do, pay attention to all of this. We're just sort of the black sheep. I think it's partly because we were nine hours behind Stockholm and three hours behind New York. So this story, it's about self-discovery, how we became a better, more effective team. And we did this by capitalizing on understanding our own data. Not everyone can be data scientists, mathematicians, statisticians, analysts, whatever. But everyone can grasp why it's important when 70% of our users can't log in for whatever reason. And so this is a story of how our team finally developed and adapted the use of logging in metrics. So you might know that Spotify, I'm very public about using Agile. We actually have a few videos on YouTube that are very, very awesome, very entertaining and awesome, nicely done that I highly encourage you to check out. But one key aspect of Agile is iteration, right? And we certainly iterate over our product. You might be as annoyed as I am when you open up the Spotify client on your desktop. It has a blue banner asking you to update and it comes pretty much every single dag, right? So that's our Agile approach. But we also iterate over ourselves as a team, as an individual, as a squad. I'm trying to find what works for us as a company and us as a squad and everything in between. Late last year, my squad began participating in an internal program. It's very corporate-speak called Effective Squad Circle. And its purpose was to hone in on the squad itself, how we can be more effective. And I actually found that to be very beneficial for myself and for the team. So what it was essentially monthly challenges to figure out the team's current condition, essentially not tracking anything, not knowing what's going on, and comparing it to the desired condition in terms of delivering the product, the feature, service, whatever that we were meant to. The following explanation might sound very like project manager-y, business-oriented-y, but I found it very useful when implementing metrics. And I also find myself definitely incorporating this thought process when talking to other teams non-tech and tech about diversity and stuff like that, like measuring our diversity initiatives. So it's very widely applicable. So the main goal was to find our target condition as a squad. Where do we want to be? And certainly difficult to establish a goal without context, without an understanding of where we are now. And so to figure out our baseline, we sat down and answered a few questions as a group. The first question, what do we deliver? A seemingly easy question, right? Yet myself and the squad initially struggled to answer this right away. It definitely didn't roll off our tongues. So we looked at our past and listed out integration projects that we delivered and services that we currently maintain. Includes Uber, Last FM, Soundhound, Twitter, hashtag music, among others. The most critical is certainly our Facebook login and sign-up registration. So I've hinted before about 70% of our user base logs in via Facebook. The rest is an email login, which my team does not own. The next question is, for whom do we produce said product or service? And who actually defines our work? At Spotify, we believe that leadership is meant to convey a vision and the squad is meant to implement that vision in a manner that they choose. There isn't micromanagement. There's a lot of trust, actually. But our lead team defines the direction that our squad takes. So there's certainly one of our customers. Also, there are many integrations we've done, a lot of external partners. Thankfully, the squad's a bit shielded from direct communication. But that makes our business development team, and indirectly, the partners themselves, our customers as well. But who depends on us? Who actually uses our work, product or service or whatever? So yes, 70% of users log in via Facebook. It's safe to say it's a pretty integral system to the Spotify platform. We certainly can't fuck it up when Facebook makes breaking changes to their login protocol or API, which they often do unannounce. And I've had to live patch our servers because of Facebook. But there are other teams within the company that plug in to our system for social aspects like sharing to Facebook within the platform. Moving on. The next question is about expectations. What do our customers actually expect from us? When trying to answer this question, it occurred to us that we never actually asked them what their expectations were. And so we did. We wanted to know exactly what was important to them with what we deliver. Was it on time delivery? Was it predictive versus productive? Do they expect solutions to problems that they don't even know existed? What were their expectations on quality and usability? Other measures? Were their expectations with how we work as a squad? Did they want updates on progress and problems? So we couldn't ask all our customers, right? We have 75 million customers. And expectations would be different among our various customers for the team. Internal teams expected our Facebook service to be reliable and scalable. Business development wanted us to be clear on what we can feasibly implement, which is definitely hard for a web developer to adequately appropriately say how long something will take. And it's safe to assume that users will want to log in or sign up via Facebook if they so choose and for it to just work. So moving on, the last question was simply, did we meet those expectations? How do we know we've met those expectations? This is where we sort of stopped in our tracks, right? No, we didn't know if our services could handle the extra load. Or if slash when users couldn't log in. Or how many users have activated Spotify with Uber and of those, does the experience actually work? So being people that have an affinity for tech and automation, we naturally implemented a technical solution. So feedback loop, very generic term, not just a basic metric. Feedback loop is to understand, I guess, the feedback given. For our squad, one of our main feedback loops that we chose was metrics. We all wanted those snazzy looking dashboards, eye candy graphs and visuals using the latest technology that will probably be obsolete tomorrow. But in all seriousness, we wanted an immediate visual of various metrics. But what do we want to see? What questions do we want to answer? So in line with the idiom, to throw spaghetti on the wall to see what sticks, the squad brainstormed for a while, trying to come up with questions that we would like to answer. So some ideas included sign up or off-flow abandonment, Facebook connected users, the percentage of toll users and the trend over time. Percent of users that signed up through Facebook per hour day a week, we didn't even know what the frequency should be. Facebook related errors, which is a lot. Daily active users by partner feature, registration, subscription rate, referrals by partner, web API usage by partner. We even wanted a squad focused Twitter when you search for Uber and Spotify to see what people are complaining about that neither Uber or our team could see in our logs. We wanted to know outstanding JIRA issues, request count by internal requesting service or team. So you group these metrics into buckets like usage, system health, business performance. These buckets eventually are becoming their own dashboards, cycled through our big office monitors like everyone sort of has. But we also created a few new processes based on these questions. One of the process reviews our progress as a squad. Every retrospective we will look at a couple of metrics that deals with the squad performance, like how many bugs were closed in the past sprint period. We will also judge if this metric, if we want to continue seeing this metric, if we can actively improve upon it, like maybe we only closed two bugs this week, but it's because it took us two days to acknowledge one bug, right? And what if new, if any, measurable items should we look at for our next retrospective? Another is to have goal targets for every integration project that we do. For example, we will know we're successful when we have, with this integration, we have X amount of new users within the first two weeks. It's true that this sort of goal line can only be judged based on historical user acquisition numbers. So we basically, so we definitely have to do some work. But this will feed into our retrospectives, especially once the project is complete, like how did we do? We also have a few post integration questions for business development folks to ask of our external partners on behalf of the squad, like understanding our responsiveness, how our developer tools are, if they met their goals. And we may think an integration was like super successful, but on there and not so much, which has definitely happened to us before. So we've only been caring about metrics for since the beginning of the year. So this certainly is only the beginning for us. But a lot of us iterate and give us a hard look at what we track and why. You can track everything that moves, but will you get inundated? Certainly, certainly so if you count every single leaf of every branch of all the trees in the forest. So how can you tell what's important and what's just noise? And so this goes back to understanding your customer's expectation essentially boils down to business value. How can you maintain and improve upon business value of your servers or product? How does counting every Facebook connected user help us better ourselves? So when thinking about implementing various metrics of our feedback loops, I came across various questions to help me see the forest for the trees. So when creating a new metric, how do metrics map to business goals? Do we lose money or lose so much money if the Facebook's login service isn't up? How do you prioritize different goals of what you want to drive? What's most important? Does it mean that you're going to neglect others or just allocate time by priority? Is this new integration project more important to pay attention to than other ones? That's fine at it if it is, but you're just going to have to prioritize. How can we create dashboards that are actually actionable? What is the goal? And more importantly, how can we drive that goal? Are we just going to say, oh, look, Facebook sign-up service is down. Let's go have lunch. When representing metrics, how do we correctly measure what we care about? You have to break out the old static or statistician book or whatever, static books to understand how to best represent all the metrics that you take. We have so many tools to help us create gauges, counters, meters, histograms, timers, but what representation is best for that question or metric? When actually consuming them, how often do you check on metrics? Dashboards never looked at, which is a common problem I found in my team. They just become like background noise. So how do you make dashboards more visible, more in your face? Should someone be responsible once a week to like a goalie? Do you make them more visible by slapping them up on the TV monitor, which I found is not entirely work, especially if it's right in front of me. I just kind of ignore it. Or perhaps you have email snapshots sent out to the team, but maybe they're automatically filtered away, or you're like me and auto archive all unread emails. Being a bit introspective, for things that we don't reach 100% of our goals, we need to assess the difference. Why does it exist? Is it even solvable? If you look at dashboards, what actions are you actually going to take? Do you even create a dashboard if a goal or an alert isn't set up, or if no action will be taking? Probably not. What about unknowns? What is a known? We know that X amount of iOS users have connected their accounts to Uber, but how many don't use it because the driver has an Android phone, or the driver just isn't aware of the service? How do you approach those unknowns? Are you comfortable with them, or is it even worth it to explore them? Bringing back to this slide, ultimately the goal in us answering these questions is to give us both a shortened decision making cycle, as well as a more informed decisions about strategy and partnerships. It's super easy to get lost in the forest, and it doesn't help us that we can get all this instant feedback, that all these visualizations just look awesome, but in essence, we're in essence placing current values in historical context in order to see patterns developing. How long on average does it take for the team to implement a new integration? Do our customers or ourselves expect a shorter turnaround time? Do we wish to just be able to appropriately estimate the time and work on the work that takes for such a project? Or maybe which internal team do we have to educate about rate limits on our service? And the one here with these feedback loops, these thoughtfully implemented metrics, we can use these goal lines and alerts to create a more efficient team. We'll deliver higher quality software, since we'll get immediate feedback on any bugs that we introduce, any system that fails and the like. But before I answer this question before I wrap up, I do need to be a good friend. With all these questions and context in your mind, you should go to Hennick's talk tomorrow. It's 11.45, and it's about practical, it's titled Practical Logging and Metrics, and it's basically the technical complement to this talk. So, all right, to answer this question, should you track everything? Very anticlimactic answer, probably. Only if you define a goal, you can define an action if you haven't met that goal, and if you actually pay attention to it. I know, anticlimactic, but within reason, right? So, thank you. I hope you took away something, and I think I have one minute for questions? Or how about we just like go out and go get some wine, and you can find me if you have questions. Yeah? All right, thank you.