 Okay, hi everyone. I hope you all enjoyed the cat screensaver all of those cats are from a shared folder at stripe Where we all post pictures of our cats. So real people own those cats So so I'm based in our Singapore office out here. We're working on payments in the Asia Pacific region and I'm going to talk to you about a Survey of different ways that we do really cool things that you might have learned in your college classes across all of our All of our teams actually we're going to cover quite a bit of material And I like to I really want to cause like the when am I ever gonna need this talk because my house in college I kept thinking like seriously like why are we doing this? It doesn't make sense I just want to do like real that real coding and I rise all of those things I was learning at compilers and like algorithms class were actually very relevant So let me explain who I am. Just so you have a little more context on me. I am class of 2015 So I'm not that much older than you guys. I hope I don't seem that old. I In high school, I actually was like afraid of math I didn't really I thought I wanted to do engineering, but I was like not sure if I could like handle it I guess And then I almost didn't do computer science because I thought it was gonna require a ton of math. So I mean I think Over time I started learning about things like the growth mindset and like realizing that like you really do that Just practice and like do a lot of exercises in order to get better at things like this It's not just like about being smart I Failed algorithms. I really want you all to know that because I don't I didn't want to sound like I was like something I like crazy genius that was like constantly acing all my classes I failed a ton of classes including algorithms had to take it twice Think I still got to be in the second time So I didn't even do that while the second time around but the point is I still understand all these things and you all can too One time I freaked out I had to write a breath first search for a work project I was like no way this is ever gonna be relevant like why would I ever need to do breath research and then I had to write I had to use D3JS to like write a visualization of a bunch of nodes and it's like highlight some pastel on it on it So I wrote a BFS in JavaScript and it was great And then I got to like go email my algorithm teacher like it happened it came true And saw the job you know again I'm trying to make this like as accessible as possible None of this stuff is going to be like so crazy complicated like over the top that like you can't like Go get a job in industry and go do it like right away So I feel obligated to explain what stripe is in case any of you don't know what we do We're not a customer a consumer facing company. So I get that It's okay. You don't know what we do So our stated mission is increase the GDP of the internet. I Don't know what GDP is. I could not even tell you what it stands for I if we throw it around a lot I think it I think it means like how much money a nation has I'm like looking back at the start people like see if they're gonna nod at me. No, yes. No, yeah So I think I think our mission is like to The companies like we have a so we have this magazine We've done a bunch of issues on like security on incident management and a bunch of other really great topics The one that we have out here for you all today is the programming language is one and just like how companies think about which languages They should be writing in So that's my example of like something we've done just to help people start and like maintain their businesses And that are the old whole suite of start products are all like pretty Pretty much like relate to that idea. So we have like for example helping people do market places So that's like if you have a grab or delivery That's like a customer one of us paying a driver or a delivery person or a restaurant So how do you move money between? Two sets of people while also like taking a cut because your grab or delivery and you want to make money off of that Terminal would be our point-of-sale system. So, you know, you're physically in a store You want to pay with a credit card and you get a punch in your pen Payments obvious. We help people process credit cards with our sweet API issuing we help people Like use an API to programmatically create debit cards that you can like hand out to your drivers for example Sigma that's data analysis. You know, you can kind of use something That's vaguely Sequel-esque to analyze your business data. So you help you get insights radar. We're gonna talk about that a lot in this talk It's gonna be about like fraud detection and doing machine learning to make sure people are not using your platform for bad things Then we like to call us the global payments and Treasury Network because we're here in Singapore. We're going global We're trying to help people accept payments all over the world across like any kind of currency Which actually I guess we're not we're not quite there yet I mean for example if you're in Australia, you might only be able to like accept Australian dollars Maybe one day in the future we want you accept any kind of currency across the whole planet and lots and lots of payment That's beyond credit cards So I like to start with what I'm going to end with so I'll tell you what the lessons are and then you can like reflect on them Throughout the talk and then we'll go over them again Algorithms they come up in weird places. I feel like there's a lot of like really strange like textbook examples Like you know, you do like Dijkstra's algorithm and you say oh, I guess like maybe possibly Google Maps uses Dijkstra's algorithms I find the shortest path to your destination and like that's your one textbook example and makes sense but I want to emphasize that you know it it can be it can Come up in more non-obvious ways You're not alone when I was in university I you know, I had a lot of problem sets where I was just sitting there I get 2 a.m. by myself like doing a bunch of problems. I was like pretty sad about it And you know, there's like a bunch of anti plagiarism rules So like you can't go and like ask your friend for help, right? I don't know what it's like at National University of Singapore But like I'm sure you all have to like the come up with the answer here by yourself But at work you get to collaborate like if you don't know the answer first of all you google it And then you like go read some research papers and then you go and implement the thing Or you ask your teammates and they do code review or they just like help you figure out the problem Like there's a reason why the phrase like rubber duck debugging exists I actually find that a lot more exciting because like if I can't solve a problem It's a lot more fun just talk to someone about it instead of like sitting there alone like hoping that I figure it out before the deadline Time and space complexity. I'm sure you're all doing these like interviews where you're like, oh big Oh, like it's gonna be like oh man complexity like something like oh one space That's cool and all the like It's not the end of the story Sometimes if you're if n is like 20 you don't care what they go is it could be like n squared or you wouldn't care, right? So actually in this talk, we're gonna be talking about tough different kinds of constraints or things that you could think You need to think about in the real world um and similar to the previous point It's a lot more easy to understand the problem when there's an actual thing to be solved Like there's a business need like you're saying like if I don't solve this problem Maybe I will lose like a million dollars or If I can solve this problem, but do it within like to make my software run within like two hours Then that's fine, too. Like you don't have to make it run within like milliseconds so once you know like what the business needs are and what the What the constraints of the problem are that makes it a lot easier to come up with solution rather than like I don't know coming over like a heap-based sorting algorithm blah, blah, blah. I'm just like throwing out words now So those are takeaways will go over them again at the end today We're gonna cover three different like stories about straight I kind of cheated because I said I was gonna go to three, but then I number two I put two So we're gonna go over four different stories actually in this point If I'm talking too fast, please just like I guess like wave at me or something and also ask questions like throughout this like I mean Just think there's like a really friendly college lecture or something just raise your hand and ask questions. So we're gonna go great limiting first What is rate limiting there's two kinds of great limiting so first, you know You can think about just limiting how many requests are coming in per second For example, that's right. You know, we have a lot of merchants. They're all making requests to our API and you might just want to say You can have like a thousand requests per second And then the other kind would be how many concurrent requests can you have? The difference here is like let's say that you have a so you have someone who's like trying to like load up a page of all of their payments And then they're staring at it and then they're like, why won't it load so they just keep refreshing the page You don't actually want to spawn off a request for everything a lot of those cuz like that would I mean, I guess that would just If you have too much CPU to start server might crash and so on and so on so these are two kinds of great of Concurrent requests. I guess I want to commit to like you all think like well How like how would you stop someone from making too many requests at a time between each of these? If you if you already like Google this and you might already know the answer because that's how I found out Yeah So Different Yeah, well, you know, it's awesome is both those sound totally right I'm gonna repeat just to get it on the video to be honest So the first one you said was like batching so like you wait like some amount of time you take all requests You process them all once second one would be debouncing where like you have somewhere I've only heard about in the context of like you're let's say you're doing like auto-complete and then you're typing So if I type out like I'm gonna type out like Los Angeles and I type L Then it'll like wait and then if I talk about Los Angeles, it'll always send like that one request All right. Awesome. Love both of those answers. Then we're gonna talk about third answer. Oh Oops No, we're gonna tell a why first. That's great. I don't remember my own slides Okay, three reasons why you might want to rate limit and just to like really draw on the point First of all you care about the impact of one user and you can see that there's this like one evil cat. That's not a mere cat So this you know, it's not evil it like to be to be clear I you know, there could be one malicious user They might be trying to do like a DDoS on you or they actually might just be confused Like they might have written a script that they thought was just gonna run for a few minutes And they they're just like spamming your service because they don't know how to stop script on their own Machines or they don't even notice how many requests that they're sending So that's one second is you want to prioritize critical requests So that's right You know, you have you might have the API endpoint that's responsible for listing all of your payments It's not really that important. You have the one that's literally submitting payments. That one's extremely important So you might want to load shed everything that's not important so that you can get the things that are important Finally you personally might be broken So if this if stripe is degraded for some reason like maybe like a bunch of our boxes are out of memory Killing that we might want to make sure that we are able to load shed everything That's not important so that we can stay up and this cat is still trying to drink even though It's extremely confused and that's why I like this gift so much Sorry, so I was like digression. So the a totally different way to rate with it. All these are totally valid ideas Pretend like you have a bucket. We're gonna call this the bucket token algorithm It has a Wikipedia article, which is where I got a lot of this information from so Consider every bucket like one of your merchants and you see how some of them are not completely filled Every couple of seconds try adding a drop to each bucket that doesn't already isn't already filled with water So you can see that we are gonna slowly add water to all of them And imagine there's no request coming in at all right now eventually all the buckets will be filled so Imagine that one of these buckets is already Empty You will just take away some of my water or some like token You could say you have 20 tokens just take away one and then do the request if they don't have any tokens Then they can't have a request And you can see here. Yes, they slowly deplete their own buckets So it's I feel like it's a pretty like intuitive like way of saying like You know you can only have a thousand requests per second if you took away too many You got to wait for the next time that we add a drop to your bucket and then you can have that request so That's literally how start those rate limiting I will give you a fun fact and then I'm going to in Definitely right after this tell our recruiting team that I told you this but when I was interviewing at stripe the night before they were They sent me this email that was like hey You know make sure you like do some research like read our blog check it out And so I was like okay I'll go to start calm like blog and all that and then Top blog post was like how start does rate winning so I read the whole thing because I was like really really like excited About this job interview read the whole thing clicked on the Wikipedia article read about the buckets thing came in the next day They're like all right. You're gonna do rate limiting and I was like you know your blog like literally tells you to answer like right now Like I can just tell you exactly what I read last night, and they're like we're gonna do a different problem now And you know like maybe that's how I got the job, so I don't know It pays off to do your research on the companies that you're gonna interview at Okay Wait, actually, I guess I should stop is there any questions about rate limiting before I move on to a totally different topic If you do start coming with questions, I know take sometimes it's you know look it takes me a few minutes Sometimes the process and realize I do have a question of course is raise your hand you point and what means take time at the end as well Compilers one of my favorite classes in No second favorite. I got like start rating the classes. I like in college So we have two different ways that we use like kind of a cool compiler theory in school or in strike Which is not a school So we have this thing called a sorbet, which is what is a static type checker for Ruby So static meaning that it runs before at compile not compile time like runs that Commit time before you commit things to our code base And what's really great about it is it uses eyes like really big fancy words I had to wear in compilers class so like ASTs and CFT's and DSL's and then I had to go and like Google them before I came here because I didn't want to Tell you all the wrong thing about what these words are so after since since actually it's like you Of course you model the entire language and make sure you know like there's a if statement There's a code block inside the if statement. There's the else statement and so on the CFG Oops. Yeah, CSG. I got on the previous one a domain specific language would be kind of a language That's like specifically meant to do like one test as opposed to like C++ C++ Java Ruby In this case the domain is type checking Ruby And what I thought was kind of interesting is that most of our code base at strife is not written in C++ I think this might be the only project and it's because it because performance actually mattered quite a bit for this one and The reason why performance matters so much is that every single time like a developer is Committing code and submitting it to our like gigantic test suite It matters. You don't want a developer to have to stay around for like an hour every time they save their code So in this case they actually wrote quite a few optimizations to make sure it was running as fast as possible because it's a huge impact on developer productivity So that's one time that like all the time complexity matters I took this cool screenshot just in case you didn't already know what abstracts and checks tree is. I took this from my corkers Do you imagine that we're modeling the language that way? and then I Thought that this was like a somewhat almost like this was a contrived example because like how often are you gonna? Write a type checker for a language like are you gonna go off to your job and do that? No, ideally you would go and do this one if you were writing Ruby One that I thought was even more exciting was radar, which is our fraud detection system We actually allow merchants to write out their own rules For example, this one says block if the card is a prepaid card if they're trying to charge over a thousand dollars And they're not in the United States And it actually tells you like which which charges would have matched this rule so we let people write out these like Our own DSL or domain specific language for detecting fraud We let people write out these rules and then we parse those rules and then well first of all We have to return syntax errors to our users and we should parse it and then like run it against their own transactions so That one we actually did write our own whole language it for it's pretty great Yeah, so that's a cool application of like all the confider theory coming up with it I feel like writing a parser for a language is actually a pretty Common task more common than I would have expected. I thought compilers would never get come up I didn't put this in the slides, but even like at my on my own team I Earlier this year we had to write our own like compiler basically for converting the Objects from one of our vendors object meaning like the way that they were encoding the data that we were sending them we wrote a compiler for that so that we could parse it and then automatically Generate the data for another vendor and we so he wrote that compiler We took all our data moved it to the other vendor and then we saved ourselves a ton of money So there's a ton of ton of reasons why you'd write a parser at work. It turns out Yes Oh, yeah, we do use a couple of open source libraries for parsing and lexing in all those chords I actually don't know what the names of those open source libraries are off the top of my head But yeah, there's a there's a lot of great tools out there for writing your Sorry, okay Finally, I want to spend the most time on machine learning because I feel like this is the most like most important and most relevant like The social side of everything you do. It's so important I know I feel like there's a huge debate in the tech industry right now about how like you can get through a whole computer Science degree without taking an ethics class. I don't know if that's true here But my school you could you could just graduate and like you wouldn't have to ever think about like the implications of what you are doing Are you all thinking about it now like do you have to take an ethics class? Okay, I'm so glad but please Even if it's not that good please take it seriously like I mean I think it's like one of the most important issues of our generation Are we all good to keep going? Yes, yes, okay So Why did my car get rejected huge question merchants constantly want to know like a user was trying to make a purchase We didn't let it go through what the heck And so just to like frame what we're going to talk about I'm gonna first explain how stripe talks how stripe tells users why our machine learning algorithms or reject them So decision trees first of all we like a couple of bunch of like like pieces of information or features for why a charge We've been above gone rejected So you know it was the card issued in the United States and so on where they're more charges in the last 30 minutes You can imagine the features getting pretty Fancy, you know like how many IP addresses was the card used in in the last day It wouldn't make sense if they use it was used all over the world Unless it was like a travel agency. Maybe I don't know And we use some like statistical data We already have to like figure out like what the likelihood of fraud is if you imagine We can make a gigantic decision tree and like make a decision at the end on like what percent we think the fraud is and we Instead of threshold on that But that's that isn't quite sustainable So actually we take this and we do it a bajillion times We have a bunch of different decision trees that are all coming with different decisions based on the features that we gave it So that's called a forest. I love it trees for us. It's a great metaphor What we do with at the end of all these decision trees is that we we just average or just aggregate all of the Decisions from all of these and then at the end of it That's when we figure out overall being really like whether or not it was going to fraud or not but the unfortunate thing about this system is that you can't Explain to a user like what happened. You just say I don't know We looked at the forest and the forest said that like maybe it was going to be fraughty. I don't know so It's not a great user experience because we do want to be able to tell customers something like just why so Here's what where we were before we Had it just sorry. I can't I know you can't read it back there But it says like there are many contributing factors to the risk level of this payment based on activity across the Stripe network Because you know, of course we check we don't use just that merchants a we use all of our merchants data to Detect fraud so we don't tell them rich at anything then after we used a new system for Explaining this we actually said oh it was used from a unusually large number of IP addresses over the Stripe network And so that's a much better explanation for why this happened So the way that we got here to this second picture is we actually Start using predicates. So we come up a couple of really short rules once a Once we made a decision from our gigantic machine learning model We were running against the smaller predicates and then the highest priority or most obvious one that The my priority I mean like the one that we think is like most important to share with the merchant The first one that matches like you know matches the same features as that charge will use that one as our explanation So that's one way of doing it in a really fast-performance way without like having to review our entire model You can imagine that also if you could somehow explain the entire model then fraudsters would just use our information to Continue to circumvent our systems. So that's why we don't we always want to be a you know Just give like a two sentences. Don't give it like a whole essay of like what happened So I wanted you all to feel uncomfortable. So I did this Drake So no yes, and then I felt so bad for doing this. I was like, oh, yeah, hello Phil kids. I'm so old I'm sorry for meeting you Okay, most important piece of this. Why did it really happen? So let me give you an example Let's say you walk into a bank and you're like kind of alone and they're like Yeah, let's look at all this data and the human is looking at and thinking like they're thinking a ton of different things But they tell you at the end of it like you don't have enough collateral Then you're then you're just sitting there thinking like is that really what happened? Let's say for example, I was a person from a minority group Or if I'm trying to get an apartment here in Singapore, like did they did they really say it was because my income wasn't high enough? Or is it because they just didn't want me to live there because like maybe there's some sort of racism there All those things can actually happen very much in machine learning it can definitely get amplified And so that's why we actually do quite a bit of work to make sure that we try to keep our machine learning algorithms That's fair as possible. So I want to explain what fairness means to I Saw some really great quotes from Peter Norvig. I really respect him He's a director of research at Google and he was and he was kind of explained so I think I do like you go in they Tell you a reason, but you don't know if that's what really was going through your head You know they all times people were just like kind of jumped to a conclusion And you say why and then they fish around and like eventually justify Something that was maybe more of a feeling and the same thing can happen machine learning Again same thing think like they might have been your skin color might have been your collateral. Who knows? Um Now you're right wondering like well, how could how could a machine learning algorithm be biased like what does that even mean? For example, extract doesn't know what race you are. So like why would it how could it be racist if it doesn't know what the race is and The answer is actually that there's a ton of different factors one really big way that you can unintentionally make a biased algorithm is by Using features that are tightly correlated to features that you do not want to be making decisions on so people from a certain Ethnic background all live in one neighborhood and you decide that like all all charges from that neighborhood are fraughty You're gonna end up like rejecting a ton of charges from that neighborhood I found this concept like pretty difficult to I come like to understand myself to be honest because I kept asking like Well, if there's a high rate of fraud then why shouldn't you just say that there's a high rate of fraud? And actually I will explain why how I kind of like came to terms with that myself I'm not sure if that's like something you all are like working through right now like seriously. It's a complicated topic I feel like I'm not giving it enough justice here So when I think about fairness or when we start all think about fairness here We're usually looking at rate of true positives and rate of false negatives. So For something to be fair, we do want to make sure that like let's say you you you are a good Customer you are not committing fraud you would You would hope that the probability of you getting marked as a good customer is equal regardless of Things like your gender your background your physical location And that's not always true. So, you know, you can imagine like a Good merchant in a bad neighborhood and a good merchant in a good neighborhood And if they have less like if they have different probability of like getting through our systems and that's unfair on the other end of Again same idea. We don't want to disproportionately reject people or false negatives. Let me make sure I I always have to draw the table so I mean So not flagged fraud, right? So we don't want to disproportionately allow Why why do I feel like I wrote this table wrong? Went through a lot of times Did I do it wrong? No Not flagged and there are fraud. So yes, we said that they were not Friday, but they were Friday I'm glad I wrote the table out for myself This is why I failed algorithms. This is really why Luckily in when you're at work you get like a couple of weeks to do things and like in school You get like a few minutes to do things and that's probably the struggle Yeah, same idea of false negatives if the people if people have different like probabilities of getting rejected by your system Then that would be also a problem So heavy topic you guys have any questions like oh, it's difficult Yeah, good It used to be a teaching assistant. I can stare at you as long as All right So you're probably asking like well, can't you just like not be racist? I don't know like isn't it really easy? I Mean sometimes I wonder like what why can't we just fix it? So there's a ton of reasons actually it's pretty hard problem. So first of all, there's like the complexity You know, we just went through like really quickly We just went through the whole like decision for us thing like how are you going to go and like reduce bias from that? How are you going to make a performance system that can like synchronously block a charge as soon as it comes in on the API? While also like mitigating bias the user experience, you know, you I Remember I was gonna say this yeah, we You might want to start collecting a ton of data about your users in order to figure out whether or not your system is biased When you sign up for strike, we don't like ask you about any of your demographic information because that would be Incredibly like possibly illegal and also like really uncomfortable for the user. So For example, we had we do have a product called stripe Atlas where we We allow people to incorporate companies in the United States Even if they don't live in the United States We don't collect gender and at some point someone was like shouldn't we figure out if like women are disproportionately not starting companies And I was like, well, that's a good question too bad. We can't do anything about that because we can't collect gender We can have people opt-in to that to give that information, but that's not the same as just collecting it So the user experience side like there's the legal Regulatory and like just creeping out your users side of it. You don't want to ask them weird questions Then there's the cost and what I mean here is that in prioritization where Let's say you really really care about this and you have some sort of pressure to also finish your deadlines You know, you want to launch and launch that new product and you're kind of think like Like maybe maybe the algorithm is like a little bit racist, but like I can just like push it over the line Like make it make a ton of money. So like there's like there are different Priorities that are being rewarded sometimes and like just to give some examples You know, like when you're trying to if you're trying to optimize for growth or number of users You might willfully not notice that the ton of your users are bots Same thing happens here There's a ton of reasons what like beyond just like wanting to do the right thing where you might just not notice that you're There's something going on underneath So I do want to emphasize that you know, we should have some empathy for the people who are writing all these algorithms I'm sure that they're thinking about but there's just a ton of other things going on at the same time There is actually a cool website. You can Google a fat ML fairness accountability and transparency machine learning They do a conference every single year. They have a ton of research papers It's a really great community for talking about fairness and machine learning There's a ton of really like even more research that you could be doing The for example GDPR, I think people are still figuring out right now Oh, I should explain what GDPR is the general data privacy regulation. I hope people are nodding because I said the right words So you're in the European Union you have a ton of some more rights around what what people do with your data So with GDPR, there's actually a right to an explanation People haven't really figured out legally like what this means But that does mean that increasingly in the EU we might be required to start giving people explanations for why machine learning algorithms gave decisions for what like what we decided on them So it's a really a growing. Oh, yeah. Do you have a question? How do I get the truth values for the fraud cases in the decision tree so we So we actually like you know met you we do get some training sets So for example, we like a human says like 12 these 12 were fraudulent these 300 were legitimate Then we use that to figure out what features like what have led to that probability Which actually is like kind of interesting because that means like a human seeded this data, which is also in our source of bias That's a great question Thank you for asking So got see the Drake meme again, that's great Okay, so I will close out this machine learning discussion with like my favorite Harry Potter Pope It you I don't go right like Harry Potter, too But he's like Jimmy haven't I taught you anything well I always told you never trust anything that can think for itself if you can't see where it keeps its brain You can't see where machine learning keeps its brain You have no idea what decisions it's making and so it's really easy to just like jump to conclusions I like Marko time things as like fraudulent when like it might just be perpetuating the kinds of biases that we already have in our society Okay, so again, I told you all I was gonna go over the takeaways from the beginning unexpected algorithms are unexpected and also It's strange how like in the in the real world, you know, you you think that's gonna be all is like extreme math I can see the whiteboard here is like a bunch of grass But they're like like there's a literally eight-step algorithm right here. I thought that was really funny It's not like that You do have to make a lot of like business related decisions and also like figure out some like pretty hard problems And you're not alone. You know, you're not like sitting here running on a whiteboard by yourself And your time is face complexity That's just like the very beginning of it But then you do have to think about a lot of like the real world repercussions of the things that you're building So I hope that was a lot of fun. I thought it was pretty exciting Stripe does a ton of this type of work. I think people like really like talking about like these deep and complicated issues I feel like I