 to taking the pain out of support engineering. My name is Ceci Korea, and I'm a software engineer over at Context.io. Mostly what I do when I'm talking about support engineering is I handle support for a publicly available API. So typically my support users are other developers. So that's mostly the frame in which I'm basing this talk around of. But hopefully a lot of the stuff that we'll talk about today is also applicable to other types of support teams. Before we get started talking more about support, I want to start a little bit with a story. Back when I was in college, I got a job working at an amusement park back in Houston, Texas, called Six Flags Astro World. And if you've ever worked for an amusement park, they have a crazy customer service-oriented type of work culture. So one of the biggest things I took away from that job, and honestly, it's been a while. So I don't remember much about the job itself, but I do remember one thing that they taught me that they ingrained in everybody. And that's if you don't know the answer to something, don't just walk away from a customer. Say, I don't know, but I'll find out for you. And that's something that I heard back when I was, I don't know, maybe 19-year-old college student. And that something has stuck with me pretty much throughout every job I've ever had. And I feel like that mindset of you may not know something, but I'm going to go find out for you is at the heart of what makes a good support engineer or a good support engineering team. So what is a support engineer? I've been talking to people throughout the conference and whenever they ask me, what are you doing here? I'm giving a talk on support engineering. Oh, what do you mean by support engineering? A lot of people have different definitions. So for me, a support engineer is a developer or some sort of technical person that's providing technical support to other developers or maybe end users or maybe within your company. So you're providing some sort of support internally to other developers at your company. So it really depends on the size of your company and how you currently handle support. Again, for me, support engineering is supporting other developers that are integrating our public API. But again, it might be different for your company depending on your setup. Why am I passionate about support engineering? Honestly, support has a bit of a bad reputation when you talk about support in tech, but I'm actually really proud to work in support and I really love it and I enjoy it. And I think this is because twice in my career, I've had first jobs where my job was to support other people. How can you have two first jobs? That doesn't sound right. My first job ever out of college, I worked for Electronic Arts, I worked for their customer support division, mostly writing content for the website. And then years later, I decided I wanted to go into programming in my first programming job, where I had some sort of engineer in the title. I was a support engineer for Context Sayo. It's currently also my current job, but I've sort of been promoted since. So twice in my life, I've had sort of like that title as like my first way into an industry. So I'm really fond of support. So let's talk about some support engineering best practices. Today, we're gonna learn how to think critically about problems. We're gonna learn how to prioritize a relationship with your support team. We're gonna learn about continuity. Specifically, this is continuity of your tickets. We're gonna talk about ownership of your tickets. We're gonna talk about boundaries because that's actually something that we don't often talk about when we talk about support. And I think it's really important to make a happy team. And then lastly, we're gonna talk a little bit about some tools that I cannot live without for my support life. So part one, thinking critically. And this is when I'm going to put on my hat of having worked for Electronic Arts for a while. I was assigned the Sims 3. Yes, this was a long time ago. We're at the Sims 4 now. And I was assigned to be the subject matter expert for Sims 3. And this meant that I got to play the game a few weeks early and I played it for maybe about two weeks straight at work and I was going at it from the perspective of let's try to find things are going to confuse end users and let's try to preemptively write content that's going to help end users overcome those issues. So what the studio did right in this perspective is that they gave support early access. It doesn't actually always happen. In the year or so that I worked at EA support, we didn't always get a build of the game that we were supposed to write content for. Sometimes some studios would even give us like a book list and that was super helpful but that wasn't always the case. So a lot of the times we just had to make do with what we were given. And this studio was really nice and that they actually did give us a build of the game for us to test prior to launch. They gave us that time to test before the launch. They gave us plenty of time and that gave us the time that we needed to preemptively generate FAQs. I actually looked today and I saw one of the FAQs that I wrote back during the launch of Sims 3. It's still up there and that's what happens when you give your team the time and the tools to write good content. Good content sticks around. So as you can see, I wrote this about eight years ago and it's still up online. So I'm really proud of that. And this taught me that if you remotely think that something could be an FAQ, it should be an FAQ. And what this whole sort of case study around Sims 3 and giving the support team early access and giving us the time to really go through the game and think critically, that's something that you need for each launch that you have. You need to give your support team the time to look at what it is that they're supporting and put on their thinking hats and start thinking about in terms of is this going to be a question? And that's not something you can really teach, that's something that you can only learn by doing. So giving your team the time that they need to be able to get to that content is really critical. So this was a successful launch again for those reasons. Let's talk about a not so successful one. And this is where I learned that prioritizing our relationship with support is really important. Mass Effect 2, great game. However, we had an issue where people who purchased the game through a pre-order were given a code to get DLC downloadable content once the game was released. For some reason, it was really difficult or it just was a very confusing flow for people to redeem this DLC. So this launch generated a huge amount of calls and emails to our support center from gamers who couldn't figure out how to redeem their DLC code. And this is something that would have been avoidable. So in this perspective, what the studio got wrong is that they didn't give support early access to the game and they didn't give the team that chance to be able to go through your game or go through your product with that sort of thinking cap. And they didn't give the team that time to think critically about what it is that they're supporting. And I feel like had we been given that chance, someone would have said, hey, you know what? This flow is a little confusing. Maybe we should write some content around it. So eventually we did end up writing some content around it and that helped. I actually ended up making a video on how to redeem the DLC, which I thought was silly at the time, but it actually really helped the people on the front lines of support have something that they could point the customer to so that they could perform some self service. Having said that, what the studio got right after this is that after they saw the amount of volume that that question generated to our support center, they were able to say, you know what? We messed up and moving forward, we're going to prioritize a relationship with support so that we can be more preemptive on future launches. So to give them that credit, they did realize this and then after that they did prioritize a relationship with support. And obviously, you know, this has been a while, I no longer work there. So I don't really know how things have gone. I don't think that they've ever had any issues. So let's move on to part two and let's talk about continuity. This is where my fun anecdotes about working in the gaming industry stop and I start going into more about working in the tech industry and doing support specifically for a public API. So continuity, I see two types of paradigms of support and one works I think a little bit better than the other. So let's take a look at both. This assumes that if you are doing support, you have some sort of support queue or ticketing system. What I see typically is a pattern of either dedicated or rotating support. What do I mean by that? Dedicated support is when you, depending on the size of your team, you have one person or a group of people whose sole responsibility is to answer support questions. You can also have a rotating support pattern where one person in your team rotates, whether it's weekly or bi-weekly, and gets to put on that support hat. This is typically what I see in smaller teams where you might not necessarily have the volume to justify one person full-time always doing support. So a lot of the times teams will have some sort of like support rotation where someone trades in the responsibility of answering support questions. Typically, I hear terms like support star or support firefighter for these types of support patterns. Let's talk about dedicated support. You can probably tell that this is a pattern that I like because I think that this works a lot better. Dedicated support works because it allows you to have business continuity for your tickets. This means that it's someone's responsibility to answer those tickets, and that means that it's that person's responsibility to ultimately let the customer know about a resolution. So they take a ticket from initial contact through that solution, and then they close that ticket. And that means that you have continuity. That means that there's one person that knows what the final resolution to a particular issue or bug was. And that is really important because that is what builds your support history. And then you can draw from that. You can find efficiencies based on that knowledge. Also, when you have someone whose sole role is to do support day in and day out, especially if they're in a more of a support engineering type of role, when they start doing one test repeatedly, they'll find efficiencies, and that might mean that they get to write a script or a tool to automate that process. And a lot of the times you don't get to those tools. If you don't have someone whose job is to do those repetitive tasks over and over again until they figure out, hey, there's an efficiency here that we could learn from. This also helps you build relationships with your customers. For me, that's specifically important in my team because I support a product that is an API. So typically the relationship that we have with our customer is really long. They're really gonna integrate with our API. Typically we're talking about years of support. So establishing that relationship with the customer, with the developer that's integrating our API is really critical because we wanna make sure that they feel comfortable asking questions. I often get things like, oh, well, I don't know if this is a bug and I don't know if I should tell you about it. No, absolutely, tell us about it. We wanna hear about it so that we can fix it. And you don't get there unless you have that sort of relationship. Now, when dedicated support stops working, it's when you don't have a way to sort of promote the people in your support engineering team out of support engineering. A lot of the time support engineering is a role that is typically filled with junior developers and that was actually my case. And I think it's a pretty great way to level up junior devs, but you have to have a plan to get them out of support once they grow out of that role. And that is an issue that I see a lot of the times with support engineering positions. You just kinda get stuck in support and you don't really see a way to move out of that position unless it's at another company. And from this perspective, you lose out of all of their relationship building or all of the knowledge that that person had while they were doing that job for you. So I think that having that exit strategy and a clear path to move forward from support engineering for your team is really important so that your team members can avoid burnout. Now let's talk about the other side of the coin and that's rotating support. And why I think it doesn't work. So when you have a support rotation, usually this means that you have a team and maybe your support for the week and you get to meet the front lines, you get to ask questions or answer questions from your clients. And then once that week is over, you go and you move on with your life and you forget about the support that you did that week. And what ends up happening a lot of the times if you don't have a very clear process for this is that there's not a lot of follow through. So sometimes book fixes or solutions or tickets sort of get lost in the shuffle because once someone is done with their rotation, they just kinda forget about their tickets. A lot of the times this can lead to handoff confusion. A lot of the times teams that have a sort of rotation for support don't have clear processes for handing off the ticket. And this also leads you to a point where there's not a lot of business continuity for a specific ticket. So if you're handing it off, if it's sort of like a long-term fix and you keep on handing off that ticket to the next person to do support the next week, you lose that history. So that can also be challenging. Also, you lose some efficiencies because each time, depending on how long the period is between rotations, you might forget how to do something. So there's always a period where someone gets back into the rotation and again, it's sort of like their week to be in support. They have to learn how to do support again and that it takes a little bit of ramp up. So there can be some efficiencies lost from that pattern. I do feel like there is a way that you can have this sort of support rotation pattern work and that is if you have very clear guidelines, specifically around issues that are gonna take longer to solve than your rotation duration. So for example, if you get a ticket and you verify it's a bug and you feel like this is gonna be a non-trivial issue to solve and it's definitely gonna take you longer than a week to solve, you need to create a support issue whether it's in JIRA or whatever other ticketing system you use to track features or to track your sprint. And you need to add them to the active sprint. What I typically see when people use this pattern of rotation for support is that they might even get to the let's create a ticket and put it in our backlog but if they don't actually put it in the active sprint that ticket might not get resolved until weeks maybe even months later. And by that point, if you have that ticket open with your end user, whoever responded first is getting graded on that response time. So by not adding that ticket to your active sprint, you can potentially increase that time for a resolution and that can really hurt the support engineer that's providing the support to the end user because then that means that they had a ticket open for weeks or maybe even months. And you really want to try to avoid that. So that's why I think it's really important that if you do get to a point where you see an issue and it's gonna take a long time to solve, go ahead and add it to your active sprint otherwise you might forget it. Also, and this is really big, if it happened on your rotation, you still own it. What I see most of the times with this type of support rotations is that again, I did my week of support, I didn't get this fixed but I'm not gonna support this week so here you go and a lot of the times there's a lot of history that's missed whenever there is a handoff like that between people that are doing on call. And I feel like the best way to avoid any sort of scenario where an issue is lost track of is to make sure that you still own that issue after your on call is over. This increases that ownership of that person to try to get that done. Otherwise it might just fall through the cracks. We're gonna talk a little bit more about ownership in a second. Right now, we're gonna talk about ownership and escalations. There's a couple of ways that I also see people handle ticket ownership and escalations. I call it removed versus owned and these are totally terms that I came up with. They're not established anywhere so if you find them a little confusing let me know and I'm happy to clarify later. Essentially what I mean by this is that a removed escalation is when you work as part of a team and you have to throw an issue over to the other side of the fence and then someone else fixes it and then they let you know hey it's fixed get back to the end user and let them know that it's fixed. So there's sort of like a layer of removal between the person doing the fix and the end user. I feel like this pattern can be really challenging because a lot of the times teams lose a sense of the priority for the issue when you're throwing things over the fence. This also means that when you're throwing an issue over the fence if the other person on the other end doesn't actually have access to the end user it loses a little bit of that impact. When you have a team member that is handling a support escalation and they don't really get to talk to the client I feel like that loss of accountability to the end user adds time to the resolution because typically when you leave people to their own devices and you say hey here's an issue put it over the fence fix it people tend to prioritize supports a lot lower than feature work and again this kind of relates or goes back to the idea that typically as an industry we don't tend to think positively about support we're not really excited about support it's just something we have to do. So that's why when you use this pattern of a removed escalation and you throw something over the fence and the person doing the fix doesn't actually know who it's impacting they're just not really gonna prioritize the work. They're also not being graded on an SLA like the person that's actually responsible for getting back to the end user with a resolution. So when you throw things over the fence and it's someone else's job to actually fix the bug that person is not actually being graded on the time that it took to resolve and the resolution back to the end user that person is being graded on whether or not the bug got fixed. But the person actually getting back to the end user on the support side that person gets graded on how long the whole interaction took. So if it takes this other person a lot longer to fix and they're not really prioritizing the fix it hurts the other person on the support side of things. So this kind of goes back to that idea of like if you don't see that impact just people don't care as much and this is again it's not malicious or anything like that. It's just that it can be really hard for you as a developer to prioritize support if you don't really see the impact of your work. You're more likely to see impact for new features that you're building as opposed to maybe a specific bug or edge case that someone is experiencing. So let's talk about owned escalations and this is the type of escalations that I really like to do. And the way that I see it these are a little bit different when you throw something over the fence in like an owned escalation pattern when you send something over to a fix for another team it is that teams or that person's responsibility to ultimately get back to the client with a resolution. So they are owning the solution they are owning the response back to the customer. And the reason why I think this works much better is because this allows you to have more of that business continuity if in my support ticketing system I can actually say, hey, John, here's this ticket I'm escalating to you. If you have any questions that are gonna help you solve that issue go ahead and ask the end user directly. So this means that there's not so much of the middle man in between and then also the fact that that person gets to talk to the client and ask questions brings the client background and center. So I feel like that helps sort of alleviate that issue of not seeing the impact of the thing that you're trying to solve. This also increases accountability because when you know specifically the person that you're talking to and how the issue is impacting them and their business, I feel like you're much more likely to feel like, oh yeah, this is absolutely something that I really should fix and I should prioritize. So again, it increases accountability. And also when it's your job to actually get back to the end user and let them know, hey, I went ahead and fixed this issue, then you can also start seeing any bottlenecks in the process because if it takes, once you escalate a ticket and if it takes that person or that team a little bit longer to get back to the client, then you can start seeing where some bottlenecks might be happening and then you can start addressing that. For our perspective, even if we don't have a specific fix, we like to at least get back to the end user and let them know, give them a status so that they're informed, hey, this is gonna take a little bit longer than we thought to fix, but just wanted to give you the heads up that we're still working on it. And if it takes, we check in a couple of days or every couple of days just to get people a little bit of an idea if something is gonna take a long time. So that also increases that accountability for someone because if you're having to tell your end user, we're still working on it, we're still working on it, it might sort of like help you or motivate you to get something done. The next thing that I wanna talk about is boundaries. And as I said earlier, this is something that we don't typically talk a lot about when we talk about support, if ever. And I think that this is something really important because if you don't have clear boundaries with how you communicate with your clients, this can actually potentially lead to some serious burnout from people. So let's talk a little bit about boundaries with clients. One of the things that I see, and this is typically something that I see for smaller teams where they can't really have 24-7 support. So if you're working for a large corporation, this might not necessarily apply to you if you need 24-7 support. Typically people have support centers all over the world so that they can get or catch every time zone. But for me working in a mid-sized team, we really can't do that. So in order to be able to get back to people within a reasonable amount of time, we established support hours and we stick to them and we have an automated ticket if someone contacts us after hours to let them know, hey, we are based out of this time zone and we operate during these hours. And if you send us an email outside of those business hours, we did get it but just FYI will reply within this timeframe. And that helps your team know that they don't have to be constantly checking for new tickets and they don't have to be getting back to people at non-business hours. Also, one thing that I've learned sort of the hard way is that you need to allow for a reasonable amount of time between responses. I had an issue where I had a developer who was integrating our API and we typically don't do code review but it was a particularly slow day so the developer sent me some code and he said, hey, I'm getting this error and I decided to look at it and it wasn't even an error with our API, it was a Ruby error so I was helping this person program some Ruby even though that's outside of the scope of what we can really do with support, at least in our perspective. But I still helped him out and I let him know and I sent him on his way and then a few minutes later, he reopens the support ticket and he's like, oh, well, yeah, I'm not getting that error anymore but now I'm getting this other error and again, it was another sort of Ruby programming error, it was an error with his code, it wasn't an error with the API and I decided to help him again and then he reopened another support ticket and now I'm getting this error and again and I was like, I think he's using me as his own personal stack overflow and if I continue to do this, I might write the entire application for him through support tickets and that can be a huge liability for you if you're in this sort of support type of business where you're helping other developers integrate with a product, that's a liability because if the other person, the other end user developer doesn't actually know how the integration works, they're not gonna be able to troubleshoot any issues on their own and also very specifically with code and code samples, I often have to tell developers that specifically ask me, can you please write this bit of code for me? I'm sorry but we can't write your code because if you don't understand how this works then we're liable to you for that little piece of your code base and we simply cannot be liable to all of the developers for their own integrations. Again, it's setting that boundary of we work on this API and we will help you integrate with this API and we will help you with any errors that you encounter while doing your integration but we very firmly cannot help you do the integration on your end or write your own code and I feel like that was a very healthy boundary for us to establish because again it could potentially open us up for issues and allowing a buffer time between responses solved my specific issue of people trying to treat me like I'm their personal stack overflow because when I allowed more time between responses it allowed that person to go and try to figure things out on their own and it actually unblocked them. So now even if I do have the time because it might be a particularly slow day, I try to time box the time when I actually answer questions so that I'm not constantly being someone else's crutch. I think that crutch is a little bit of a harsh term but essentially when people get used to like really good support, it does become a crutch so it's a fine line to balance and I think that going back to a healthy time of response time between different interactions can really help you not be a crutch for the people that you're supporting. Here's a recent interaction I had where I practiced this idea of a little bit more time between interactions and then he came back and he said, you know what, actually I read your docs. I read the freaking manual and it's okay now and this is the type of stuff that I love to see when people are able to say I read your docs it helped me, it unblocked me, yay. Let's talk about good behavior and bad behavior and reinforcing good behavior. This is an actual email that I got fairly recently from a developer. I'm not gonna read everything out, try to redact it a little bit but essentially this guy is really, really angry and you can really tell from the tone and he's saying sorry but this whole thing is messed up. I was trying to test your stupid webhooks and they don't work and this is a really big deal to me because our support tickets don't just go to the support team, they go to the entire engineering team and for our engineers to see this it gets them really riled up, it can be really demoralizing so from that perspective I wanted to put an end to that type of behavior real quick so this was my reply. First of all I tried to pull rank and say hey I'm the lead support engineer here at this team because sometimes that helps people take them out of whatever foul mood they were on sometimes people just wanna talk to someone in charge so I said hey the lead support person here and I just wanna let you know I will get to your question which actually was user error but I said I just want you to know that we have zero tolerance for that type of language and that type of behavior this goes out to the entire team and we will not hesitate to ban any developers if not stick to a modicum of the quorum when talking to us and our team and after this he never talked to us like this again his tickets became way more reasonable he's still working with us he's still working on this API but I needed to shut that behavior down because I knew that if I didn't address it really quickly and sternly it would continue this is where I can't stress enough that your team's sanity and happiness is so worth it and it comes first because when I wrote that message I didn't have to go and ask my product manager if it was okay for me to send that answer they I knew that they would have my back whenever I wrote them that reply so if you are managing your support engineers in any way shape or form you need to empower them and you need to let them know that their happiness ultimately really matters and if you have to step in and talk to someone in this way to let them know that their behavior is not gonna be tolerated your support engineers need to know that you have their back because really if you don't have their back and you don't take their happiness into the equation you're gonna have a bad time if your engineering team is unhappy it's just really gonna affect that morale but also doing support engineering is so rough again you get people that talk to you like that fairly regularly so people can get burnt out doing this so really you want to make sure that you have your teams back and in this case I'm happy to say that my team had my back and I felt empowered to tell that developer to quit that type of behavior Lastly I wanna talk about tools for the job and specifically around documentation typically we talk a lot about documentation documenting our code we talk about tests being documentation and it's all great but we rarely talk about documentation as documenting things like troubleshooting or a thought process or how to do common tasks and when you're doing support engineering this is so critical so I came up with this idea for my team of the support playbook so this is a living, breathing document where we document processes of how to do common tasks and why this is really important is because when you onboard someone else to do support for you you don't have to specifically pair with them 24-7 while they're just getting started you can actually say, hey jump on the ticket in queue if you have any questions consult the support playbook first if you can't find the answer let us know and we'll pair on this with you and this actually allowed us to onboard people on to support a lot quicker because people felt empowered to go and find the answer for a specific issue on their own before having to try to escalate the issue so for these reasons I think that support playbook is really vital for your support team it helps reduce training time and then it also allows you to find efficiencies for automation if you find yourself and your team constantly looking at a specific page on your support playbook wiki on how to do a certain task you know that that's something that could possibly be a good candidate for automation I also think it's really important to empower your team with lots of data so from that perspective I live and breathe by my dashboards specifically we use Datadog and Grafana I specifically like Grafana a lot because they have an open source version that you can sell posts and what this helps me do as a support engineer is when someone has a question like I get a lot of panic developers saying your API is down it's not updated on your status page and then I go and I have the power that this data gives me to be able to tell the developer no it's not down but I do see this issue let's go and figure that out so having that data I can really empower your team logs logs are my friend and I could not live without Scalar Scalar is an amazing log aggregating tool that we use we actually have three different accounts that we sort of switch back and forth I am not paid by these people at all I just wanted to give them a shout out because I could not do my job without this tool that helps me see all of the logs but also it helps set up alerting and that alerting can go to our Slack and that alerting can also open up incidents in our paging system so this is vital to us we also use RunScope for automated API testing again not being paid by them just a really neat product and the way that we use this to monitor our product is that we have RunScope tests running about every five to ten minutes and the results go into the Slack channel and we're able to track response times and whether something is succeeding or failing and this actually allowed us to see as we were working on new features and improving our API it allowed us to see how our response times were actually improving so I highly recommend this tool RunScope also makes an open source tool called Request Bin that is really helpful whenever you're trying to troubleshoot endpoints or specifically webhooks they used to do a tool that they hosted that allowed you to create an ephemeral endpoint unfortunately some people were abusing that but Request Bin is open source and you can self host an instance of Request Bin on Heroku and that's what I do a lot for testing things and also helping other developers test our endpoints so TLDR talked a lot about support engineering and sort of best practices and what it means to me and I feel like a support engineer by definition they're good at troubleshooting they're also good at communicating because they're having to talk to end users or maybe even between teams they're also good at seeing patterns once you start doing something like support engineering every day you'll start seeing patterns emerge that'll help you again gain efficiencies so people that do this day in and day out can be really great at seeing patterns within your own product so I think that a good support engineer is a good engineer and I really believe that support engineering can be a really great way to level up junior developers as long as you're giving them a path to move forward within your organization to other engineering opportunities within your team so having a good culture having a culture that prioritizes support engineering simply means having a good engineering culture period we have any questions about support engineering I'm happy to talk to you after this maybe out in the hall my name is Sessi Korea you can find me on Twitter at Sessi Korea and I'm always happy to talk about support with you thank you