 Okay, so yeah, my name is Hilary Parker and see I have Etsy there big big orange letters So yeah, I was Karthik just said I'm a data scientist at Stitch Fix currently. I've been there about a year Before that I was at Etsy and I'm pointing that out in part because a lot of the theme of this talk I actually am borrowing a lot of the kind of cultural things I learned from the culture at Etsy in terms of What I'll be talking about. I also have a podcast called not so standard deviations I recommend checking it out Cool, so yeah, like a like any good talk I'm gonna start with a tweet of mine from two years ago And so I know that the title of this talk is a little confusing opinionated analysis development. What does that mean? And really the idea for this talk started about two years a year and a half ago November of 2015 at that point I'd just been through kind of all the summer conferences with our and Like statistical conferences and I felt like there was a real gap in the way that we talk about people writing code And I don't know how many people here know the use our conference Yes, so use our for example is this code? I mean the title makes it sound like it's for using our But it's really more about like developing our packages I would say and so as you can see from this tweet I was like we really are missing a word for people who want to write code Like reproducible code careful code efficient code for doing analysis But not necessarily developing packages that they want to open source and share with a bunch of people And so then I said like what about analysis developer? Any other ideas and so obviously with the headless talk. I decided to stick with this But I did want to go through what some other people were saying and shoot them down here when they're not available So easiest argument ever so The first one like I get this a lot like how is this different than a data scientist? And so in order to answer this I just got one of like the first Google results for what is a data scientist and it's like this woman who knows all of math And statistics and all domain knowledge and soft skills. So like business schools Obviously knows how to program Like in database like if essentially the DB admin for her company and then also knows how to communicate and visualize The data so like all of these are things that you need to know I could have substituted in drew conways, you know Venn diagram, hopefully with the unicorn on top of it I want to borrow that in the future But obviously like data scientists is like such a useless term in my opinion It doesn't actually get to any like you can say your job tells data scientists And you basically have no information about what that person actually does on a day-to-day basis So it's not a good way to define like this very specific subset of what I was talking about And someone else chimed in and said what about a reproducible researcher? And I like this and I think reproducibility is something I've cared about for a long time. It's a good word generally But I think there's more than just reproducibility with what I'm talking about Specifically, there's you know, you want to create analyses or the code for analyses that's reproducible But there's also a sense of accuracy like you want to know that your code is accurate in doing what you think it's doing And there's also a collaborative aspect that we never talk about So if you're writing analysis code that you want other people to access and possibly contribute to like reproducibility that idea kind of falls apart and doesn't discuss collaborative tooling at all So I thought it was a little bit too limited to just call this reproducible coding Hadley Wickham then chimed in and was like, oh, what about a data analysis engineer? And this is a this is a divide that we have not been able to solve yet So I think of it as like I'm Regina George trying I'm I'm Gretchen trying to make fetch happen And Hadley's Regina George like you're not gonna make that happen So yeah, I mean, but really when you come down to it There's it's sort of a tomato tomato like you have software developers you have software engineers. I think we Like think of those as the same person or the same type of person same job title And so it's the same thing like a software developer Develop software and so I wanted something like an analysis developer develops analysis I personally like developer a little better than engineer. I like the fact that development kind of brings this Like it's a word you think of it in terms of so this is a picture of my cat But I had to like sneak in this is hers a kitten and this is her much older And I like the development like she's developing into this, you know, majestic creature Similarly, you know, you develop an analysis you get it into a more final state And yeah, I like I also develop as a photographer. I'm not quite at Max's level yet, but I'm trying to get there So when we talk about developing an analysis, there really are two specific components that I think it's really important to tease apart The first is developing the narrative of the analysis. And so that's like the almost like the scientific argument Like what point are you trying to make what models are you going to use to make that point? What arguments are you going to make like what type of visualizations? Are you going to make to display those arguments and just generally? How are you going to convince your audience that the thing that you're doing is that like the conclusion you made is the right conclusion And so it's kind of like I'm I'm one of sidestep all of this and say, okay That's obviously really important. That's why we train statisticians. Why we train data scientists You could talk about this pretty much forever And this is a constantly evolving thing that needs to adapt to the audience But then when we talk about like what I call the technical artifact It gets a lot more specific where it's like what tools are you going to use to make the deliverable that's going to tell your narrative? What technical coding choices are you going to make like are you going to be using like the tidy verse or are Python like there's just a bunch of choices you make and talking about that is Frequently as important as talking about the narrative aspect So so that's sort of the why I think this term needs to be there So then I got this tweet from Carl Broman who's like the sweetest person ever And I perpetually feel guilty that I'm throwing him under the bus in this talk But he was saying oh like what about a good analyst And I didn't get this just from him a lot of people are like well all these things you describe of writing Accurate reproducible code. That's just like part of the job description of a data analyst And so as I said explicit disclaimer that curls like a really nice guy. He'll email you with pages of help if you ask him for it He's non-judgmental But that being said I took issue with this idea of a good analyst Because obviously the opposite of that is like a bad analyst, right? someone who's doing their job poorly and That just didn't like sit well with me I think like like convincing people to do something based on like shaming them is almost never a good idea But I will say one thing My thoughts on this have evolved a little bit since this original tweet Which is that I was talking specifically about the job title in the person here of like analysis developer But I actually think it's more important to talk about analysis development or this process of making this technical artifact And so like I said before calling people a bad analyst, I think it's just like not a helpful Not a helpful way of communicating that they're not doing this effectively And why is that because creating analysis is very hard? It's very error-ridden. We all know that and It's like channeling my Elaine Seinfeld. We kind of yada yada it away We talk about the we talk about the Narrative and we talk about the point we're making and we kind of hide away all of the work We're doing on the code to get there And there's a lot of reasons why that happens I know from statistician's point of view. I have a statistical background. There's this idea that it's limiting creativity Like you just need to do what you need to do to get the job done And then I think frankly probably the biggest reason is that a lot of people are embarrassed by their analytical code And they don't want to share it because they're just embarrassed with all their for loops or whatever it is That they don't want people to see But as I said before there's really there's a bunch of common problems when you create analysis That we all know about And if you avoid these common problems it frees up your cognitive space for more creativity So if you're writing code that you know is going to be reproducible And you have tested it to know that it's error-free You're going to have more time to actually think about that narrative part Which is in many ways the more important part from like a scientific perspective So as I said, we all know these problems And I actually went through and made a list of a bunch of the problems And the point of the slide is not for you to like look through every single one But there are things like oh you rerun the analysis and you get different results Like I think we all know that's an issue with reproducibility There's other ones like a second analyst can't understand your code, which could be thought of as an error in reproducibility You make a mistake That's an error with the accuracy of the code So if you think that you're you know Calculating a linear model and you're actually not calculating linear model results, then you're going to have problems And so as I said remember before saying there was more than reproducibility I think you can kind of group all the problems we run into into these three areas of reproducibility Accuracy and collaborativeness. So as I said before It's really important to define the process of analysis development Because by defining it like by not defining it We're doing a disservice to people who have these errors that we've all come across and that we can all list and think about and instead we're like Leaving people with the ability or the we're leaving people a lot of rope to hang themselves with like we're leaving people to think that They're personally bad at analysis or they're personally like making all of these human errors in their code And that's why they're making mistakes So human error as I said before there is Etsy inspiration in this and this is because Etsy had this really great culture of blameless postmortems that Centered around the idea of like redefining human error and how we understand it And that the reason why Etsy did this is because Etsy's like this website with so much traffic So the operations team that like kept the website going was one of the most important teams at the company And so this whole like paradigm of human error really comes from that operations world And a lot of the work at Etsy centered around this like paradigm shifting book the field guide to understanding human error And so you could like read the whole book I'll give you the summary which is that essentially the paradigm shift that's presented in this book is the idea of Switching from blaming a person to blaming the process for failing the person So, you know if there is like a place for this is implemented a lot is with in the aviation industry Like if there's a plane crash they go through and like very very diligently look at the the flight itself and figure out Can they create new safeguards so that there won't like whatever the pilot did that caused this problem They can create safeguards so that that doesn't happen again and There's a lot of trust in this I think it's really important because There is the assumption that the person who Like did the error the person who like committed this error They were acting in a way where they thought that this error wouldn't happen like I don't think anyone who during a job Has a mistake they didn't like set out to make that mistake, right? Like they set out thinking they were doing the right thing and then the process failed them so that they got a result that They weren't expecting that makes sense. I see like okay see puzzle books And so yeah, like the idea the whole idea is like the current system failed the person with good intentions And as I said Etsy had this culture I think it was like a really famous Culture and I didn't want to just say because I don't know how many people saw the news from Etsy yesterday But John Allspaugh who was the CTO and he he left yesterday from Etsy and kind of like a dramatic day for them But he was like an amazing person and I'm so grateful to have Worked with him and he influenced my thinking so much. So I just want to like give a shout out to him On this like difficult time for them But anyway, it just created this more really wonderful culture at Etsy And I think anyone with an engineering background will be familiar with this type of practice where Like if engineers made mistake, they felt really safe talking about it They felt really safe sharing their mistakes because they knew that they want to be personally blamed They knew that their jobs weren't in jeopardy by sharing it and then by sharing it Everyone could work together to create safeguards against this problem from ever happening And so yeah, like the blameless post-mortem process is the idea of like in at Etsy It was like a very formal we all get together in a room and talk through exactly what happened create a timeline And figure out like what what happened and what we can change about the system in order to keep it from happening again So like visually just to like really nail this home you have an error that happens You have a blameless post-mortem where you discuss the system changes to prevent the error and then you adopt a new process at the end and Then just to like talk about the opposite of this culture, which again I think is the current culture in sort of the analysis world And like an engineer who thinks they're going to be reprimanded or blamed For an error is going to be disincentivized to share the details necessary to get an understanding of the mechanism Pathology and operations of the error. So if you have like a blamful culture Then you're going to have people who are embarrassed to share their code who are embarrassed to talk about their process Sounds like the current state of analysis development So as I said before like we all have run into these problems, right? And when I think about this conference, there's already been like we were all going to sessions where we were presenting possible solutions to these problems So like one of the talks was about a data validation service where you would have like a set like CI testing for your data validity So that's like a really good example of an action item that would have come from running into a problem here And so like we as a community have kind of gone through this blameless post-mortem process where we've run into problems and Then adopted new processes and we come to conferences and talk about those processes and think that they're the right solution And so when I have this slide of problems, we can actually create basically a list of what we think The best solutions are for those like as a community in my opinion When I talk about opinions, I actually want to call these opinions and so I have things like code review creating modular tested code using executable scripts Watchers for change data. I think that was a lot of like what Mike postdoc was just talking about Kind of only changing data when it you need to or only updating parts of the visualization rather than the whole thing But I think we can all like I would be hard-pressed. I don't think anyone at this conference would disagree with most of what's happening here So if we talk about so this is sort of the opinionated and opinionated analysis development And as I I love this part of the talk because I have lots of years of experience of being opinionated As an example of that I Remember a few years ago. I went to a party and my friend was like, oh, I never fight with anyone else except for you and so I know how to be opinionated in the wrong ways and how to alienate people and I'll give you a little bit of a An idea of how to do this from personal experience So how to not win friends and influence people the most important thing like You know going in and saying like well actually you need to test your code like or you need to do these You know, you need to have someone do a code review and why wouldn't you do that? That's definitely a way to not convince people to do code review You can add in like the wonderful medium of Twitter so that you're purposefully Miscommunicating and not able to express yourself. That seems like the point of the medium at this point And then my favorite you can kind of be the Kool-Aid guy and like have specific tools that you think people should use So you're like well actually if you use are in our studio and you use this package You wouldn't have any of these problems, and I'm only gonna express that in a hundred and forty characters And so again, I actually think this is kind of the current culture of talking about and spreading tooling and I think it's pretty toxic And like I said before one of the things like I feel like it's important to honor the fact that like as one of these people Who had to learn the hard way how to stop doing this? I do think like it does come from like people who aren't self-conscious and They're enthusiastic and healthy and other people. This is the same quote as before like if you feel safe And you feel safe to make mistakes and you'll be excited about sharing it But we're kind of just not sharing it in the right way in my opinion And so I mean with this opinionated software and defining the process I think we can actually have a shift from saying I know better than you like I'm the Kool-Aid guy coming in To saying something like you know lots of people run into this problem all the time and Here's software that takes best practices and makes a solution for you Like you shouldn't feel bad about making this mistake. It happens all the time In fact, we've engineered a solution to help with this And so opinionated software generally is this idea like I think I mean my understanding is the first opinionated software was like Ruby on Rails Someone can correct me if I'm wrong with that But the idea of this opinionated software is just that it's a belief that a certain way of writing code and a certain set of practices Is inherently better because it helps you craft results around that process And this creator of Ruby on Rails had a quote saying you know It's a strong disagreement with the conventional wisdom that everything should be configurable and in that framework Like it should be an impartial objective But in my opinion that's the same as saying that everything is equally hard And I think that's sort of where we're at right now with analytical code where it's like you can do it However you want we're gonna run into these problems and we're not going to tell you the right solution to them And just to like identify One of the most motivating things for me like they got me thinking about this was sitting in a meeting once where Someone had created kind of this like non-opinionated software You can even almost think of it as software that is implementing the wrong opinions And it was just you know, it wasn't reproducible it was had like these weird dependencies on like dynamically changing things without any sort of Any sort of visibility into the connections from different data sets I mean, I just remember sitting in that meeting and thinking like like this is bullying like there are rules Like there are rules to how we do this and we shouldn't be embarrassed about talking about it And we shouldn't make it like someone's personal responsibility how to do it, right? So, you know, you can say like this is analysis There are rules or if you're really on board with me, you can say, you know, this is analysis development Or analysis engineering and there's rules how we do it and it's not because you're personally failing it's because That we run into these problems all the time and that we're in a blameless culture where we can talk about it So just to kind of sum up, I think it's really important that we define the process of technically creating the analyses I think it's really important that we define opinions based on common processes Common errors and then shift the blame from individuals using these like blameless postmortem kind of culture as a community Push for software that we think makes it easy to implement these opinions So I think, you know, obviously are in Python and some other languages are good with that And then really focus on the creativity in that narrative developing aspect and instead of focusing on people doing it wrong So with that, you know, like like I think we would all do be well-served to try to stay away from sort of This culture of dictating what people should do And I think if we start to do this and define the process will just be in a much better place as a community. So so that's all I have but Can definitely take questions or feel free to tweet opinions at me, too definitely use to that so cool