 think we're gonna get started the music kind of turned down so I think that's my cue. So today I'm gonna be talking about augmenting human decision-making with data science. So my name is Kelsey Peterson if you want to get in touch with me my Twitter handle is Kelsey underscore Peterson but I actually have a confession to make I rarely use Twitter I think I signed up for it like eight years ago but I'm trying to get back into it so feel free to tweet at me or share this presentation I'd love to get in contact with you all. So I'm a software engineer at Stitch Fix a personalized styling service for both men and women and just out of curiosity how many of you here today have heard of Stitch Fix? Whoa okay that's awesome and how many of you guys have used Stitch Fix yourself? Okay sweet so for those of you who haven't used Stitch Fix the way that it works if you're looking to sign up is you go to stitchfix.com you fill out your style profile where you answer questions about your size about your fit about your style and your price preferences and so then you the client are matched with a personalized stylist that stylist then works within our internal software and they see information about the client and they're also served a potential inventory for them to pick out about the clients and I specifically I stated that I was a software engineer at Stitch Fix and I work specifically on the styling engineering team so I work on the team that builds and maintains the software that stylists use and so when stylists are making decisions about the inventory we have algorithms to help guide those decisions so then the stylist is in charge of hand selecting five items for each client that box of items is then shipped directly to your door you're able to try on these items in the comfort of your own home you keep what you want and you're able to return the rest so over the course of this talk while we're talking about augmenting decisions I'm breaking it down into three sections so first we're going to be talking about the ways in which humans make decisions second we're going to be talking about the limits of human decision-making and third we're going to be talking about the ways in which we can help users make decisions within our software so the first question how do humans make decisions the prominent psychological theory on human decision-making is called the dual process theory this was popularized by Daniel Kahneman's book called thinking fast and slow which came out in 2011 and has been on top of the charts ever since the dual process theory breaks down human decision-making into a two-system approach so system one is like the hair shown in this photo it's quick it's automatic whereas system two is like the tortoise it's more slow and effortful so first system one system one is what fuels our impressions and our feelings it's fast automatic and intuitive it's driven by associative memory and it's constantly creating this impression of what's going on around us so let's take this example you can see this man up on the screen and you automatically assess his mood is happy or this woman's mood is sad or this dog with its ball in its face it looks really excited probably playing catch with its owner and so we didn't actually intend to assess the mood of these images but it just happened and that's what system one is it's effortlessly effortlessly jumping to conclusions judgments and decisions and what I found really surprising when working on this talk is finding out that 95% of human decisions are made in this way they're made instantaneously within system one and so system one is really automatically generating these intuitive reactions and instantaneous decisions that govern the majority of our lives so in contrast system two is a more effortful deliberate type of thinking it's used for complex math problems exercising self-control or performing a physically demanding task so similar to what we did in system one let's try out an exercise to get us more comfortable with what system two feels like so you can see this multiplication problem up on the board you know that you could probably solve it with a pen and paper you also probably know that the answer is not as not zero because it's not being multiplied by zero and that an answer of 10 million would also be somewhat impossible but the precise solution didn't automatically come to mind and when looking or when interacting with system two we also we often also sometimes feel a physical response so maybe our heart starts to race our pupils dilated our stomach kind of tensed up a little bit because we weren't automatically able to come to a conclusion and that's normal and in system two what we found in the example was carrying the computation was a strain you needed to keep track of where you were and where you were planning going and perhaps you even gave up and pulled out your iPhone but the immediate answer did not automatically come to mind and so this is the framework that we're going to be using to think about human decision making and what we've learned in the section so far is that system one is what our gut feelings are but it can be somewhat unreliable and second is that computations can sometimes be limited with our own brains and that we saw that we weren't automatically able to jump to that conclusion and I think it's also important to note too that in talking about system one and versus system two this has also been talked about in the context of left brain versus right brain so left brain is more logical and analytical versus right brain is more driven by feeling and intuition so now that we've understood now that we all understand the ways in which we make decisions let's dive into a few limits of decision making and when thinking about the limits of decision making I think it's important to frame this conversation as a partnership an opportunity for data science to help and so the first opportunity to augment decisions is that human decisions are unpredictable and so decisions are highly dependent on environment and mood especially within system one and our environment has substantial substantial influence on our thoughts and feelings and what studies have shown is that given the same set of information we often make different judgments so one example of this is with radiologists so radiologists are in charge of examining x-rays and determining determining if the x-ray looks normal or abnormal and studies have shown that when radiologists are given the exact same x-ray twice they contradict themselves 20 percent of the time we've also seen this within the legal system so judges studies have shown that if any of you are being sentenced for a crime hopefully not but if you're getting sentenced for a crime and interacting with a judge you better hope that you're being sentenced right after lunch because judges have been shown to be more lenient after they've eaten some food and taking a break and so the second way that we can think about human decision making and data science is that human decisions are driven by our own individual past experiences and so since we make decisions based off of our own personal experiences it can sometimes limit or augment the way that we make those decisions we're also unable to store large data sets in the world so even storing one Google spreadsheet of data within our brains is impossible for most people let alone an entire database of data and this causes us to make decisions based off of this information and then the third way is that human decisions are driven by our own personal views and preferences and so since most decisions are made quickly and effortlessly and outside of our own awareness this means that even if we know we have biases within our decisions they don't always go away studies have shown that there's almost 200 known cognitive biases and distortions that causes to think and act differently one example of this is anchoring bias which is the tendency to rely too heavily or anchor on past reference or one piece of information while making a decision another example of this is optimism bias as you can see this guy looks pretty happy he has a post-it that says be happy on his forehead and he could potentially be his decisions could be distorted by this optimism bias which means that it causes a person to believe that they are at lesser risk of experiencing a negative event compared to others and so what we've learned is that human judgments are often made with limited knowledge are biased and inconsistent which makes them prone to being risky and unreliable so stylists are humans too and so we can see this within our styling organization of what are ways that stylists are making these decisions and what are ways that they can be error-prone or risky so first inconsistent judgments so when stylists see this exact same set of inventory twice it's likely that they'll choose a different assortment every time so we can see this first assortment but it easily could be this assortment too we can't expect stylists to consistently make the same decision over time we also see that stylists find it challenging to absorb a lot of information at once and so stylists since they're limited to the information that they know when they open up a profile and are expected to style a client it takes them a long time to gather context about the person that they're styling for and this takes time and a lot of mental energy we also see that stylists only know the outcomes of the clients that they style and so if we were purely relying on gut feeling and not relying on data science there would be a whole whole whole group of data that they would not be having access to to use to predict the outcomes of the clients and then we can also see that stylists can be biased by their own views and preferences and while we train stylists to hopefully understand the client these biases don't always go away so thankfully data science can help and can help potentially make these decisions less risky and more predictable over time so in what ways can data science help augment human decisions first I think it's important to ground us about what data science means there's a lot of debate within the academic community about data science what is it how do we define it and how does it differ from the data analytics and analysis that companies have been doing for decades and so for the core for the purposes of this talk I'm defining data science as the use of mathematics or statistics to answer a business question and it differs from data analysis because it's not only about analytics it's also about the collection modeling and training of large collections of data so we can train or we can guide decisions in two different ways or we can use data science in two different ways we can guide decisions with computations and we can train decisions with feedback we guide decisions by offloading part of the decision-making process to data science algorithms can help suggest items of clothing to a stylist and second we can train decisions with feedback either in the moment or after the fact and before I dive in I think it's important to note that at Stitch Fix we use data science across all levels of our data across all levels of our styling process so we use data science to suggest an individual item of clothing but we also use it all the way up to make important business decisions so before the styling session even starts the stylists are matched with clients and we do this intelligently we predict the likelihood that the stylist will be able to satisfy the client that they're matched with so first data science can help guide the stylist in selecting each item of clothing we do this in a few different ways first is that our data science team automatically filters inventory based off of client preferences and so if a client says that they don't want to have genes in their fix we automatically filter that information out and so stylists don't have to make that initial decision and be prone to that error we also calculate match score for each item of clothing compared to the client's preferences so the higher the score the higher the likelihood the client will like the items that we send and you can see up here on the image that each item has a specific score and that is calculated for every item in every client that we have in our system we also regulate the number of items that stylists see so we only show the top percentage of items to our stylist so instead of making them overwhelmed by choice we initially limit the items that they have to be making a decision about and we use algorithms because they're better at predicting future events than humans algorithms are able to better identify and weigh predictors of success but I think it's also important to note that stylists ultimately have all the power in final say just like the president they have this veto power and can override any decision that the algorithms are recommended so the second layer of data science assisting humans is helping guide the stylist in the expected outcome of all five items together so what I mean by this is after the stylist has selected each item of clothing that is going to go into the fix we calculate on the fly the likelihood that the client will like all of those items together if it's above a certain threshold awesome nothing happens it's green but if all the items are below a thresholds there's a warning sign that pops up basically to double check that the stylist knows that they're making a more risky decision and again the stylists ultimately are the ones who are making the decision and can override our algorithms within our system so then the third layer of this is that client feedback can help train the stylist over time and so once the client receives all five items they fill out feedback related to each item sent and then also feedback for the overall fix so we can see this here this is an example from the feedback section of our application this the client fills out the scale of the size style quality and fit and that information goes right back to our stylist we also can see this for the fix overall they can provide the ratings and then provide any other feedback that they want and so stylists have access to this feedback within our system they can access it anytime and stylists are actually expected and paid for an hour a week to review their feedback and we use this information to better train the stylist to make better decisions within the with or in the future and then the fourth layer of this is that we can train the stylist with feedback over all time or over a certain segment of time and so all the information for performance is stored in the stats section of our app and each performance can see the stats related to what I just mentioned with fit with style with price and if any of those metrics are too low they have visibility into that and can alter their decision-making process and so ultimately what we're trying to do is we're trying to use feedback to hone our expert intuition and Malcolm Gladwell made made the ten thousand hours famous a while ago but it basically talks about repetitive prolonged practice to build our intuition build our system one build our automatic thoughts over time and we do this with help from our styling needs so each stylist within our organization has a manager and the managers are in charge of helping train and coach our styles so the final layer of this is that we also have insight into the feedback for all 3,300 of our stylists across the styling organization we use the performance metrics from this feedback to shape business decisions as well so we do this in two specific ways the first is that we use decision or we use this information to drive decisions about stylist training so as I mentioned we have 3,300 stylists around the country we provide training every few months to all and if we see that the performance metrics of certain segments of feedback are dipping low for a large majority of our stylists we can help better train the organization the second way we can use this feedback is we can use information to drive decisions about inventory so if we're constantly getting feedback about the quality of our items or the size of the fit that may not actually be related to the stylist's decisions it could be related to our merchandising team we may need to reassess the inventory so that brings us to so now that we've talked about how algorithms can augment humans I think it's worthwhile to also think about how humans can augment algorithms and is that even possible and what would that even look like today so machines while providing a lot of value and guiding and training our stylists are also deeply flawed they lack human experience for example if you want to get a sweet shirt to go to a club in computers are really bad at interpreting what that means and predicting things based off of that machines also are able to predict multiple options that the client will like so specificity is sometimes an issue as well it's also important to note that they lack any ethical standards this has come up in the news recently actually so I was reading this article the other day about how Facebook is hiring 10,000 people to work on security which is mind-blowing they need still they still need so many people to scour their ads to make sure that they're ethical or to take video or to take down videos of violence or suicide attempts and they were just talking about this recently in light of the Russian investigation and I think it really highlights the point that we still need humans to be able to assess whether something is ethical there's also still room for improvement with modeling and training data so no training set of data is perfect and with a constantly evolving business our needs for our algorithms are going to be changing and so with that we're going to constantly be needing humans to aid in this as well and like I said before stylists really are the ones who maintain this veto power within our system we're given creative liberty to act on their intuition and gut feeling and stylists are still able to override any computer recommendation and so we see if a stylist is within our system and they're picking out items for our clients and there's a really low match score but they really think that this item will satisfy the client for example up here if they think this pair of green shorts is really what the client is looking for they can add it to the fix and or they can add it to the box of items and send it to them they have the ability to do that we also see this with overriding the likelihood that the client will like all of the items in the box they can override this as well and so what happens is when the client overrides the algorithm an intuition doesn't match the algorithm we can learn from that and we will continue to learn from that and I think one of the best parts about the styling role and why humans are continually going to be important in this dynamic is that machines are able to find a wide variety of items that the client will like but it's really the stylists who discover and select the specific products that our customers will love and that's why letting the stylist maintain power within our system is so important and so at the beginning I showed this image of the data science team influencing the stylists but when this happens when the stylists override our algorithms and work off of their intuition rather than the data science recommendations the stylists are actually training the algorithm and they're creating this really cool feedback loop back to the data science and so that brings us to what is the future of data science and what does that look like so hollywood likes to glamorize that we're all screwed by AI and that data science is going to take over the world and that we better well just for footnote but I don't think that's true and so I think we we tend to think about the left brain versus the right brain but I don't think that's right and we think about system one versus system two computations versus gut feelings and I don't think that's the way we should be thinking about it either and then we also like to think about will smith versus these robots or humans versus computers and I don't think that's right either I think this is truly a partnership between data science and humans and we when we think about humans individually and data science individually there's a limit to where that can get us but I think the power is by creating this feedback loop between data science and humans and in the beginning I was talking about system one with humans and system two with computations and analytical thought but I think we're in the process of creating the system three and what system three is is a combination of predictive algorithms and expert intuition and this is why and this is so valuable because the relationship is mutually beneficial we're able to make decisions with more information and more predictably with more nuance and intuition and so we're able to use this feedback from our algorithm algorithms over time through guiding and training to hone this expert intuition we're also able to continually train our algorithms so when decisions are made outside of what we predict is likely to satisfy our client our algorithm learns from them and it's also important to note that poorly trained algorithms are just as bad as poorly trained stylists we need this partnership to reach levels unreachable by just data or just humans and so at the beginning I showed this other image of system one and gut feelings and system two and computations but I think system three is like what I was saying this cycle of feedback between gut feelings and computations and so the really cool thing about this at Stitchfix is that we've seen a sizable business impact from this relationship here's a few key statistics that I think are worth noting today first is lower labor costs so as we as we offload part of the stylist job onto data science we see that the time that it takes for the stylist to really understand select those best items for the client decreases we've also seen an increased keep rate over time what I mean by that is that stuck that our clients keep a higher percentage of items that we send them so they're more satisfied with the items that they're receiving from us we also see fewer mistakes from humans and so by providing guiding guidance and training throughout our software we limit the opportunity for stylists to be making these more risky and unpredictable decisions and finally we've seen greater client satisfaction and this is really important for any company we always want our clients to be happy and so we talked about a lot today we talked about how humans make 95 percent of their decisions with system one we also talked about how human decision-making is limited by information biases and inconsistency which can lead to risks and uncertainty we also learned that data science can help guide our decisions in the moment but also train our decisions for the future and that essentially partnership in data science the partnership between data science and humans is essential because humans lack the ability to process large volumes of information whereas machines lack empathy intuition nuance and ethics and both have limitations that can be alleviated with one another and so this I think is the future of data science and humans it's not algorithms versus humans it's the partnership between data science and humans through this we can hone human intuition and algorithms to become more reliable over time and the sum will become stronger than each individual part thank you I also have a shameless plug so I work as you all now know for stitchfix we are hiring so if any of this piqued your interest or you're just interested in working with a really awesome group of people please find me after and I'd love to chat more so I think that brings us to the Q&A section does anybody have any questions yes oh that's interesting okay so the question was when there are issues with system one like when you're voting against your best interests are there ways for that to be alleviated with system three and kind of this feedback um perhaps I mean I think it's interesting to think about system three outside of the context of just like technology and data science um I don't know let me think about that can you come up after is that possible okay cool yes uh so the question was how do we handle comments from our customers um so are you asking how do we handle that so your the question was how do we handle freeform feedback um well we handle it in a few different ways are you talking about in terms of like interpreting that or um reacting to customers being unsatisfied okay um so first um I we do use some like processing to handle freeform text fields um we also when this when the client is unsatisfied with items that we send them we can we have an algorithm that automatically escalates issues to our customer support team um so then the customer support representative can take care of that um and then also to uh the with stitchfix one of the powers of having humans really power our system is that we develop these long-term relationships with our clients and so if clients are unsatisfied with the items or like the total fix last time they're um highly encouraged to address that um with the client and one thing that I didn't note is that stylus also write a note to the client every time they ship a fix and so within the note this that's usually a really good opportunity to address any concerns or issues that arose in the past great question yeah yeah um so the question was correct me if I'm wrong but what's the balance between like creativity and the human touch versus like algorithms and like more stoic behavior um I think ultimately like that's kind of the the question of this interaction and this partnership um I think it's something we're constantly looking at um there's like I showed in my slide there's a lot of people are talking about fully moving to data science and what that would look like and I think the challenge is it's easy to be like oh we're moving fully to data science we're only going to use humans but really honing that balance is um I think he one thing that we really focus on at stitchfix is putting our client first and so thinking about well is this going to negatively impact the experience that they're having with us and if so then um we probably shouldn't be making that decision and yeah I think yeah that that balance is key in just making sure that our clients still feel understood and appreciated and feel like they're getting value from our service yeah and the gray shirt yeah so your question was uh what what goes into our algorithm what data points go into our algorithm uh so um I I'm not completely sure I'm able to answer that question um but come up and talk to me after I don't know if my pr team would be happy about that so your question like how do you get over the hurdle of not having enough data to feed the algorithm okay um yeah so that's a really challenging problem I think that faces data science as a whole like how do you have enough data points um to uh accurately use your algorithm to predict the likelihood of certain things happening um I think stitchfix is really interesting because we introduced recommendations and algorithms really soon after we started and so the algorithms kind of grew organically as we grew as a company um but in terms of like how to introduce algorithms once your company's gotten a little bit bigger like maybe you don't have that data set um already available I think yeah that's where humans probably play even a bigger part at the beginning where you're trying to get that data from humans um and then ultimately move over to some sort of balance between data science and the human yeah so great question so how do how does the how do human biases influence um the algorithmic predictions I mean I think this is also like another question of data science as a whole like how do you have non-biased data sets um yeah I mean I think I'm to be answered to be quite frank like I'm not completely sure um but and I think that uh the example of Facebook is a really good example too where the data isn't necessarily unbiased or isn't always ethical that they're training it also um but I think yeah I think yeah I'm not I'm I guess I'm not quite sure but hopefully over time if you can notice biases within the data you can make uh you can train your model differently I'm sure yeah so your question was um yep yeah um I'm sure we do um uh I guess short answer yes do I know like the technical details oh gotcha so your question here I think your question is kind of tying into like how does data science capture like the strength of predictive cues with an algorithm just like kind of a word that captures that and what attributes are more important to one person versus another um and I think that's that's where we can um compare different users behaviors and we um can draw the like success of one item that's similar to another client um and use that data to predict um for the client that's currently being styled and calculated in the match score for so I think that's that's an example of being able to use the power of having millions of clients and millions of outcomes um to be able to make those connections between the clients in our system yes I think you're the final question oh yeah um currently so the question was um for stylists who maybe like to override the algorithm a lot and it tends to not be successful like is there any other kind of guard against that and in short I don't think so um but maybe something we want to think about in the future great thank you