 Hi, I'm really happy to be here talking to all of you today. Today I want to talk a little bit about scaling a payments platform. First, a little bit about me. I'm mainly a Rails developer, but I like using Go and Swift as well. I worked at Shopify from 2011 to 2017, where I worked as a developer on product and led teams that handled APIs, payments, and data. In 2017, I decided to leave Shopify, and now I'm a contractor still mostly doing Rails and advising engineering teams. Today, I don't want to talk much about the future of payments or the technology behind it. Instead, I want to talk about the past. I want to talk about some of the challenges we ran into building and scaling payments at Shopify. I want to give you an idea of what led up to some of the payments tools and technology that we have today. You often know the past to understand the present, understanding how a system got into a particular state will help you understand how to evolve it further. To know what needs to change and how to change it, you need to understand how we got here. There are five parts of payments platform that I want to dive into. Integrating payment providers, compliance, scaling a payment system, becoming a payment service provider, and handling fraud. First, let's take a look at what it was like in 2005 when Shopify got started. These days, one of the go-to solutions for accepting credit cards online is Stripe, which didn't exist in 2005. There were payment processors. PayPal existed, but your options were limited and depended a lot on where you and your customers were located or what you were selling. If you had options for a processor, then it could be very challenging to get approved, especially as a new business. You would need bank accounts, financials, tax returns, and personal guarantees. Then you need to get online. Your choices were Yahoo Stores, which wasn't very flexible, or Build and Host the website yourself, which means integrating the payment providers and handling all the risk and security yourself. So that's what things looked like when we got started. And the first thing we needed to deal with was integrating payment processors. So payment processors vary a lot by location. The types of services they allowed, the fees, and all kinds of other things. So it was pretty clear that we were gonna need to support multiple payment processors if Shopify was gonna make it anywhere. There was a lot of variation of what was provided through their APIs. Some support authorized and capture flows. Some only supported purchase. They might support refunds or voids, and they might not. There was also no unified spec for all of this. They all had their own XML definition and different endpoints that you need to interact with make it all happen. Though, as Nathaniel mentioned, early on, we worked out an abstraction that hid all of these APIs and the differences and allowed more to be supported in the future. The Ruby Library Active Merchant was born from this work. Having a single interface for interacting with the payment gateways simplifies the code that needs to call them, especially if you're integrating with a lot of different gateways. Though Active Merchant isn't anything crazy. It provides an abstraction for a credit card and exposes purchase, authorized, capture, and void operations for the gateways. This is a stripped down version of the initial gateway class that everything extends from. But really it's more of a spec than a true object oriented abstraction. It didn't have a definition for those purchase, authorizer, capture methods. Instead it was up to you to just implement them when you were implementing your gateway. So this made it really easy for newcomers to add new providers since they only needed to implement the parts that mattered for their gateway. It also meant the code using Active Merchant need to know a bit about the gateway it was calling which isn't necessarily a bad thing when you're dealing with payments. Active Merchant kind of blew up. It became the library to use when integrating payment providers in Ruby. We had hundreds of contributors and over a hundred integrations were added. Gateway APIs changed a lot over time. So having the community really helped with that. These days, all kinds of businesses leverage Active Merchant to help integrating different payment gateway. Now that we had payment providers integrated and could start accepting money through them, the second thing we needed to deal with was PCI compliance. These days you don't need to think about PCI compliance as much. Freedly, Stripe and many other providers have solutions that mean you don't need to worry about it. We didn't have this luxury. So here's a bit of an overview of what's involved with being PCI compliant. It's super high level and leaves it up to the implementer to figure out exactly what they need to do to meet the requirements for their specific case. Basically it boils down to only storing the card data you really need, encrypting all transmission of cardholder data and protecting any data you do store behind a secure network and restricted access. The trick with PCI compliance is coming up with what is in scope. Any consultant will recommend putting as much of the system as possible into scope because this is a safe advice and will ensure everything is covered. This also increases the cost and complexity of any audits or security testing you might need to do. This will also likely slow down the development of your product due to the extra process that's put in place. This is pretty terrible advice. Instead, what you should do and what we ended up doing is reduce the scope of what needs to be compliant as much as possible. So it only slows down a small piece of what you're building. This allowed us to iterate on Shopify the product very quickly without all the compliance overhead since it wasn't in scope. To do this, we built a separate service to handle the credit card process. This meant that the main Shopify application that most people interact with never sees a credit card. And those millions of lines of code that are not in scope for compliance. Only the credit card processing service would be in scope. Let's take a look at how that works. So first, the user's browser posts a credit card form directly to card proxy over HTTPS. Card proxy then interacts with the processor to authorize or purchase payment. Next, card proxy responds with a redirect to Shopify with a payment token. And now the Shopify server can use that token to do additional operations on the payment through card proxy, such as voiding or refunds. Later, we took this one step further by splitting card proxy into a service that receives the card and another that uses the card. The browser posts the credit card form to card store, a tiny go service that stores the card and memcache. Card store then responds with a redirect to Shopify with the token. Now the Shopify server has the token and can ask card proxy to authorize or purchase using that token. Then card proxy retrieves the card details from card stores memcache and clears that data. Card proxy can then perform the operations with the processor. With this split, we separated the systems that perform the operation with the credit card and the one that receives it. Card proxy can only be reached by the Shopify servers reducing the attack vectors to those systems. When it comes to compliance, focus on reducing scope and isolating what needs to be controlled. Keep controlled systems simple and easy to work with and segregate duties between systems to increase security and reduce the attack vector. Now, the third challenge we ran into was scaling all of it. In 2013, when we really started focusing on scaling this, this is what the platform looked like. With the architecture we had set up, every payment request we made held up an app server process until it was completed because we were making the payment request in line. Normally this was fine, but it wasn't uncommon for some payment gateways to have very long response times, sometimes over five seconds, which meant we could exhaust resources or kill requests and potentially drop successful payments if there were any spikes in request volume. There were two issues we needed to fix. Reduce the app server workload and ensure we didn't lose any payment. When working with money, you should release your infrastructure on item opponent operations as much as possible. This way, your client can be dumb and retry operations without creating extra charges. Here's an example of where you might end up with a duplicate payment. The server makes a payment request to card proxy, which makes the request to the payment process. Before the response returns to the server, there's a timeout or some other failure, but the payment went through on the processor. Now the user tries making another payment, which goes through as well, and now they've paid twice. To deal with this, we implemented duplicate detection. We generated UUID on the app side and stored it with the order and included it with the request to card proxy. Card proxy stores the UUID with the transaction before making a request to the payment gateway. If the client disconnects and attempts to transaction again, the same UUID is used so card proxy can check if it already performed that transaction. This helped a lot, but depending on where your failure happens, there's still a possibility that a duplicate payment would occur. Between card proxy and the payment gateway, for example, since many of them didn't support duplicate detection. Instead of spending a huge amount of time and effort tightening the loop for these exceptional cases, we put monitoring and alerting in place and had tools to avoid duplicate transactions automatically. No matter how much effort you put into the resiliency of systems, you're going to have failures. Having tools and automating the recovery from failure is just as important as recovering from failure. It's just as important as preventing the failures. Now that we had some resiliency between Shopify and card proxy, we needed to deal with the slow payment gateways using up our app servers. Remember that in our current architecture, when a user makes a payment, we do the payment gateway request in line from the app server, holding up that user's request as well. To speed this up, we didn't really reduce the workload, but moved where the slower parts were being processed. We have the card store part from before, which remains the same, and we also have the card proxy interaction which remains the same as well, except we move all of it into a background job. This way, the slow part can happen outside the main request and freeze up those app servers to handle other users. When we deployed this, it cut the app server response time in half since we weren't doing all that work in line anymore. An added benefit to having the payment processing and a background job was that we now had more control over how they were processed without impacting the user's experience. We could throttle payment gateways so we didn't overload them or pause processing of a particular gateway if they were down or having issues. When scaling a system, make operations item potent as much as possible so they can be safely retried. Make sure you can recover from failures as well as prevent them. You're not gonna catch everything. Move processes out of the critical path can be a lot easier and cheaper than trying to optimize them. After all of this, payments was still pretty complicated for our merchants. It was especially frustrating to having to send them off to get a payment account, which was pretty tricky for a new merchant just starting out. This is when we started working on Shopify payments, which would turn Shopify into a payment service provider. Up until now, Shopify was mainly focused on being a great storefront and check-in. But there's a lot more to running a business like payment processing, shipping, support, et cetera. Becoming a payment service would mean merchants wouldn't need a separate payment gateway and payment account. Shopify would handle the payment transactions and connect with the merchant's bank account to send the funds directly to them. This is great because it cuts out a big barrier to merchants and a middleman for all those transactions. We started going down this road. We found a bank and payment processor that we could work with. All the interactions would go through Shopify and we could provision and manage the merchant accounts behind the scenes with the payment process. We would also manage refunds, chargebacks, and payouts to the merchant. All of this might sound familiar because this is a lot like what Stripe Connect evolved into. At the same time as specking out everything for Shopify payments, we were also integrating that first version of Stripe Connect into Shopify. At the time, Stripe Connect had two pieces that would help us out on top of being a process. API's to onboard the merchants to Stripe and access to the transaction data. There was no managed accounts, no way to manage payouts. It was pretty bare bones compared to what it can do today. A few months into our PSP project, we pivoted to working with Stripe to white-label their services instead. The big reason for this pivot was it was going to be a lot faster and easier for us with Stripe. They had an API to onboard customers already. Their other APIs were well-designed, easy to use, and provided a lot of what we would need to build our own payments. Looking back, the evolution of all this has been pretty cool to see. We started working on this with Stripe in 2012, and here's how Stripe Connect has changed since then. It's great that we were able to help fake it and grow along with it. It was pretty cool recently when I was working on an iOS app just in my spare time, picking up Stripe Connect again for the first time, and being able to easily integrate payments and Stripe connecting to my back-end server in just a few lines of code. So it's pretty crazy where it's come from what it was initially when we first got started with. Even though we didn't need to build all this ourselves, it wasn't perfect. Sometimes transactions as our payouts didn't add up as we expected because of when it was settled or the data in the APIs, then we couldn't display something as we wanted to. These things were especially important when it came to reporting our own financials. Eventually, we ended up keeping our own record of the transactions we were putting through third parties so that we could spot and handle any disc frequencies that might come up. Using a third party is great, but you need to keep your own records so that you can determine if something went wrong, but they're depending on someone else. The final thing that we need to focus on was fraud. You have a service that is really good at processing credit cards like Shopify is. You're going to be a target for fraud because you're probably also a great way to test if credit cards are valid. And people did a lot of testing for valid credit cards with Shopify. Initially, fraud detection was pretty simple. We had some basic card-coded rules. We depended a lot on what the payment processor told us about the transaction and we blacklisted failed attempts from the same user. The problem is this got it wrong in both directions pretty often. Static rules weren't very accurate in many cases. For example, using a proxy isn't a good indication of fraud these days since lots of people use them day to day. On top of this, the cost of a chargeback or fraud was pushed to the merchant often after the order had been fulfilled. This led to a lot of cross being lost with the merchant. When we started looking into dealing with fraud a little bit more aggressively, we looked at external services detected. But with something like fraud, it really comes down to the data that you have and we had lots of it already. We owned a whole pipeline so we could get to know a lot about what the customer is doing before they completed the charge, completed the purchase. Timing was also great because we were starting to invest more heavily in machine learning. And on top of this, we could make something that we knew would scale along with the rest of the platform. When building this fraud service, we first needed to find the problem and figure out when we needed to detect it. For us, we wanted to run the checks during checkout so that changed some of the data that we might have available and the latency requirements of the service. Next, we needed to find the features and inputs that we could look at to indicate if something was wrong. We needed to make sure that those features were available when we needed them. For example, chargebacks and refunds are a great indicator of fraud, but that data might lag by many months. Then, how do we measure and test it? What's the impact on the user? We wanted to be able to flag as much fraud as possible but we also wanted to reduce the false positives and negatives so the burden of checking our work wasn't put on the merchant. Finally, we could deploy the service and make sure it could handle the throughput that we needed. We also made sure that we could easily update the models as our understanding of fraud evolved. This is a breakdown of the fraud service that we deployed. First, orders in history come into Shopify and eventually propagate it to HDFS, which is our locked-air storage. Using this data, models could be built and tested using Spark. The fraud model could then be pushed to the fraud service which uses the data from the current order and other feature inputs to make a decision. Now, Shopify can ask for a fraud rating during check-in. When the fraud check occurs, all the inputs used are pushed to HDFS via Kafka so training and testing can be done on the same data we saw in production. And if we wanna update our fraud model, we can build a new one with Spark and push it to the fraud service whenever we wanted to. If you build anything related to payments or money, you're gonna need to deal with fraud at some point. If you have lots of data about your payments and who is making them, that can be leveraged to figure out what fraud looks like on your platform. Fraud will change. Your detection needs to be able to change with it. If you're interested in learning more about the fraud service at Shopify, Solmez, one of the directors of engineering, did a great interview that goes into a bit more detail. That's a pretty good picture of some of the things we went through scaling payments at Shopify. We had to deal with integrating multiple payment gateways with different features, but we were able to leverage the open source community to help us out. We talked about how dealing with compliance by reducing what is in scope and isolating those systems helped us to continue moving fast on the main product. We handled scaling the platform by shifting the workloads out of the critical path and making operations repeatable to reduce the impact of failures. We worked with a third party for Shopify payments and tracked the data that was important to us ourselves, so we didn't just depend on their APIs. And finally, we leveraged the huge amount of data we had and machine learning to fight fraud. I hope this gives you a good understanding of the challenges we had and some of the solutions we came up with, so you can use that knowledge to make the next generations of payment product firms easier to build and use. There are still all kinds of problems that need to be solved with payments. Banks are still hard to work with, security and data privacy are huge points of concern, and the availability of machine learning and AI tools is really going to shake all of this up. I'm looking forward to seeing what the next 10 years of payments tech brings up. Thank you.