 I think we'll get started. Just want to thank everyone for coming today. For those of you who don't know me, my name is Sam Arons. I'm a senior software engineer at Lending Home. Quick plug for Lending Home, we're hiring. If you're in San Francisco, Pittsburgh, or even remote, we're hiring all workers for all our offices. So if you like some of the stuff in this talk, come by, talk to me after the talk, and I can point you in the right direction. So the name of this talk is API, how Lending Home approaches legacy technologies. And before I dive into quite what that means, let me explain a little bit about what Lending Home is. So Lending Home is rethinking mortgage from the ground up with technology. Essentially, it's a 100% online, simple, elegant way to get a mortgage. All your document uploads are handled through the website. No more faxing someone. No more sending a FedEx envelope full of your documents. No more needing wet signatures on documents and going back and forth with your bank and not knowing where you stand. Lending Home essentially takes the entire mortgage process and puts it online. Online, excuse me. We lend off our own balance sheet, meaning you get the money much, much quicker than other, let's say, for example, crowdfunding websites. And really, it's a technology first solution. We've built this with a huge number of technologies, as well as vendor integrations. This is only a small sample of what we kind of integrate with and the different kinds of technologies we use. Obviously, we're a Ruby on Rails application. That's why we're at RailsConf. So you can see a lot of the technologies here and a few of the vendors that we integrate with as well. But this is really a small sense of the overall picture of who Lending Home actually integrates with. In reality, Lending Home has hundreds of vendor integrations. We've seen everything. We have vendors for moving money, for ordering appraisals on people's homes, a literal API where you send an API request and someone leaves their office, takes pictures of a house, writes up a report and sends it back to you, those kinds of APIs. And when I say we've seen everything, I really mean that in the technological sense. A majority of our API and vendor integrations are through standard, restful JSON APIs. Basically, a lot of you will probably be familiar with this. You have a JSON request, you get a JSON response, maybe you have some web hooks if the thing is an asynchronous call. We've seen this type of thing before. This is pretty normal for us. But when I say we've seen everything, I mean everything. This is an anonymous employee at a vendor, scaring one of our product managers. API, we have a server. And this is kind of a common response that we get back from some of the vendors that we talk to. And it's easy to sort of make fun of our vendors for this. But it's kind of important to recognize where our vendors are coming from. In their world, a FTP file transfer API is their definition of a modern technology where their clients will actually use it. So it's kind of important to recognize that that is what is standard in the financial services industry. And that's what we have to integrate with. Berating them or talking negatively about them won't help you build an integration any faster. It won't help them build an API. So really, you have to meet them where they are. It's important to have compassion about these vendors and understand where they're coming from. And so this is really the hard part about what Lening Home does, is integrating with this more legacy APIs that these vendors provide. And that's really what this talk is gonna be about, our learnings and understandings that we've developed over the years from integrating with a multitude of legacy API vendors. So if I had to break it down into maybe three distinctive points, I'd say that these are the three lessons we've learned from legacy APIs. The first one is build the interface you want. And I'll talk about each of these points individually, but at a very high level, build the interface you want, plan for failure, and be efficient when reading and writing. These are very important. So let's start. Build the interface you want. And I've underlined you here. You can mean you as an individual, but generally if you're working in a larger company, you means your team. Build the interface your team wants or to even put it a different way, build the interface you wish your vendor had built. And this is important for a variety of reasons. You don't wanna build, you don't wanna teach your teammates to build directly to a vendor's API. It's actually much, much simpler to build a internal API or to build some kind of abstraction layer. We are engineers, of course, that kind of abstracts away the notion of generating files and vendor-specific formats. So we've developed kind of a list of four different points that we at Leningholm sort of follow every time we're integrating with a distinctive vendor. Kind of words of wisdom as well. Clean interfaces benefit people. Machines don't care. Really, machines will integrate the way you tell them to integrate. They'll do all the messiness, they'll do all the communication. The clean interface, the internal API, that's really a benefit for you. That's really a benefit for your teammates. You're actually helping out other people on your team when you build an internal API. This can either be in code or separate through the use of a microservice. But you basically wanna carve out a very distinctive internal API when integrating with these legacy vendors. And the reason you'd wanna do that is because it makes it easier for your teammates to understand how to interact with the vendor without needing to understand the very specific file format that they require. You're trying to build up compassion for your teammates. You're trying to see where they're coming from. They've never heard of this vendor. They don't know how to integrate with them. If you're kind of tasked with doing this, the best thing you can do, especially if this vendor will see a wide range of use, is to make it as simple as you can possibly be by building the interface you wish your vendor had built for you. And then finally, make sure to use technologies that match the vendor's synchronous, asynchronous interfaces. This is something that I see a lot of, especially when the internal APIs are separated out as a microservice. You wanna actually use technologies that match kind of the synchronicity. And what do I mean by that? A lot of file-based legacy vendors are asynchronous by nature. We drop a file into the vendor's FTP server. They see the file. Maybe they give us a response in five minutes. Maybe they give us a response in an hour or even at the next day. You don't really know when it's gonna come back at you. So you wouldn't graft a synchronous HTTP microservice interface on top of what is otherwise a asynchronous interface by nature. You need to understand how the whole system works and to actually use technologies that match that. We at Leningholm are on AWS, and so SQS is a great asynchronous messaging format. So a lot of these microservices that integrate with FTP vendors use SQS to communicate. So we took all of these teachings and around 2015, Leningholm is funding a lot of loans. We're sending a lot of money to people. We're receiving money from borrowers. Money's flying all over the place. But we did it all by hand. We logged into the making portal. We manually entered in numbers. We checked the numbers to make sure they were right. That was the process. And at a certain scale in 2015, that wasn't gonna fly. That wasn't gonna be something that was reliable. We needed programmatic interactions with our bank. And we went to one of our banks and we said, hey, can we set this up? And they're like, yeah, of course. Here's the thing though, our interface is over FTP. Here's all the documentation. Here's what you need to know, but we don't have an API. And Leningholm and a lot of other companies, you just kind of have to bite the bullet and say, yeah, I totally understand that. That makes sense. I need to integrate with you immediately. I can work with this. So what we did at Leningholm was we built a microservice called Rainmaker. And essentially what Rainmaker does is it exposes a very simple internal API for the platform, essentially an amount, a direction, credit or debit. Where's the money coming from? Where's the money going to? Some very simple fields. But on the back end, inside of Rainmaker, that actually transforms it into a proprietary XML based file that we then drop into the FTP folder of the bank. And then kind of on a set schedule, we receive responses. And Rainmaker parses those files and sends messages back to the platform. So this actually works pretty well. We built the interface we wanted. We have an asynchronous SQS-based microservice. It's connecting directly to our vendor. Life is pretty good. But this doesn't scale, especially when you consider multiple FTP vendors, especially when you consider that our bank is not the only FTP based vendor in the entire ecosystem. So we needed to build something that actually worked with a variety of FTP vendors. And we didn't want to duplicate the logic of actually connecting two FTP servers in whether it's a distinctive microservice or putting all of that logic in the platform itself. So we wanted to ask ourselves, well, what interface do we want here? Let's iterate on this and come up with a really good solution. So we spun out a separate microservice, which we call Grand Central. Again, build the interface you want. Grand Central's interface is that it pulls a S3 bucket, it pulls a specific folder in an S3 bucket. And it looks for files that the platform drops in there. And if it sees a new file, it takes that, processes the file, and then sends it to the correct FTP vendor integration. Similarly, it pulls all the FTP vendor integrations, reads files that it thinks are new, takes them into Grand Central, processes them, and sends them in the S3 bucket, which the platform is then listening on. This is the interface that makes the most sense for us. It's all file-based. We're matching the technologies that are expected of these kinds of different integrations. We still have Rainmaker, but it no longer needs to learn how to directly connect to FTP and neither does the platform. We have one point of abstraction for what FTP is and what the actual underlying file connection details are. These platform and Rainmaker and any other microservices we have in the future are just worried about the specific proprietary vendor format. And kind of going along with the theme of build the interface you want, it's important to remember that UI and UX is interface as well. You want to build clean interfaces, not just at a technical level, but it's important to have relatively good design, but also make these things user-focused and be able to be done from a graphical user interface. This is a screenshot from Grand Central. Specifically, this is the page where we create what we call a file transfer. And a file transfer is just a distinct FTP integration. And you can see here, it's actually really simple. We can do this now in five minutes. Lending Home actually now has 15 distinct FTP vendor integrations, and each one of them can be set up in just a couple of minutes with just a little bit of information. You give it a name. You say what protocol it's going over. A host, a port, a username is usually mandatory. And then you usually have a password or a secret. Maybe you have a proxy you're going through. And then you specify direction. Is it an inbound file connection? Do I need to keep reading the FTP folder or an export? Do I need to keep reading the S3 bucket for new files to send the FTP vendors? So this is cut the time of creating a new FTP vendor integration to five minutes is kind of what I said. And that's really all it takes. So now when people tell us, hey, yeah, we want to integrate with you. We only have this FTP file integration. You as engineers no longer have to say, oh man, this is going to be a lot of work. This is going to be a huge pain. When you build the interfaces you want and you build the right abstractions, this is very, very simple. So the next point I want to go over is planning for failure. And this can kind of come up in a couple different ways. My favorite example of this, the NMLS, essentially the National Body, which regulates mortgage companies, where you need licensing. Their website has office hours. And not only does it have office hours, but for some reason the office hours are different on Saturday and Sunday. I bring this slide up to kind of illustrate this is the industry that Leningholm is working in. Things close. They interpret regulations in a very specific way. A lot of government websites also demonstrate this. They interpret it to mean they can't take any visitors. They can't take any new processes. You can't file any new applications. So they shut down the entire website. And it's kind of important to be aware of these kinds of failure modes. Sometimes the server that you're connecting to will just be off. Temporarily, it'll come back, but it might just be off. You might not be able to connect with it. Make sure that you can catch those things. With file-based transfer APIs, there are a lot of opportunities for failure. They shut down the server every night before they leave. This is an engineer who shall remain anonymous, lamenting a vendor that we integrated with. We have a vendor where every morning when they get into the office, they turn on the server on their desk. And every night when they leave the office, they turn off the server. And if you have files to send them, make sure they do it before they leave for the office for the day. So this is kind of also what you have to think about. You have to think about scheduling. Like I said before, you might have to worry about the server being off. But remember that this is the world that we're working with. This is where we have to have compassion. This is totally normal from the perspective of the vendor. And that's totally fine. We understand that. We recognize that. We want to work with you. We just need to build our abstractions to have compassion or empathy for vendors like this and kind of be better in our systems so we can make up for perceived deficiencies in theirs. The way to sum this up, everything will go wrong. So make your system robust. And I have a couple of different points here that I think everyone could use and Lending Home does use to make integrating with file-based APIs much, much simpler. Number one, log everything. Every time you connect to an FTP server, log that interaction. Every time you disconnect, log that. Every time you see a new file in an FTP folder, log the file name and time. Every time you drop a file off in an FTP folder, log the time and date. This will help you immensely with debugging. It'll also help you figure out kind of periods of problems and periods of relative stability. Logging everything is cheap and easy. And it doesn't cost you anything. And when you're dealing with these things which fail a little bit more often than we would like, this helps out a ton. Save a copy of every file or message you send and receive. Don't expect that your vendor will save every file you send them, and don't expect them to archive the files they send to you. Really, what you want to do is save a copy somewhere. Please do it in a secure S3 bucket or your equivalent object storage. But do save a copy. This is going to help you debugging enormously. I can't stress this enough. If you save copies of the files, you know exactly what was sent. And because of the logs, you know at which time it was sent. This makes it much, much simpler to recreate different scenarios and to also prove like, no, no, I sent you guys this file. I have this audit trail, and I have this file. If you want me to send it again, I can send it again for you. Make sure you save a copy. A static IP is almost always required. We, in a lot of these FTP vendor integrations, no one uses vanilla FTP that's unencrypted. Everyone uses FTPS, that's FTP over SSL, or SFTP, FTP over SSH. And both of these are relatively secure. But a majority of our vendors actually require one more step of security, which is having a static IP address from where you connect to them. And normally, I wouldn't bring this up, but because this is RailsConf and a lot of us use Heroku or other cloud providers, we may have ephemeral IPs. And this may not always be easy. So what you want to do is you want to make sure that you can use a cloud provider, use some kind of system where you have a permanent static IP of the system that's doing the actual important export of files. Learn the exact transport system required. Is it FTP? Hopefully not. No one in our industry uses just vanilla FTP, but it could be in yours. Is it FTPS, SFTP? Is it a proprietary file-based format? Is it one of those proprietary IBM software that costs you to license it? Make sure you learn exactly what they're talking about when they say file transport protocol. Make sure you actually dig into the exact protocol definitions, the exact port, the exact host. Do I need to connect over a proxy? Just learning that upfront and asking these vendors these questions will save you hours, hours of headaches in the initial integration and setup. Use good libraries. And importantly, verify that they actually work. Ruby's FTP libraries are not wonderful. They fail in a lot of mysterious ways. We know because we tested them exhaustively. We verify that they would actually work for our specific vendor integrations. And some of them do, but others don't. And some of them fail in mysterious ways, especially around socket timeouts and connection timeouts. We at Leningholm actually found that the Java FTP libraries are actually exceptional. And so Grand Central is actually a JRuby application that uses the Java FTP libraries to do its file transmission. Remove files you've read or let your vendor do it. This is kind of along the lines of learn the exact transport. Understand who's responsible for removing files because every vendor has a sort of different way of doing this. Usually, you will not be dropping a file in a folder full of other files. These drop boxes are regularly cleared out. And they do this in a couple different ways. As soon as you drop a file, their system will recognize that, take the file, and delete it from the server. Or they might require you to do that because the next time you drop a file, they might take both files and see them as two new files. So be sure you understand with your specific vendor who's responsible for removing these files. Whose job is it to do this? You don't want to get messed up on this, or you will receive duplicates, or they will receive duplicates. This is extremely important. A lot of people don't realize that this is actually necessary. And finally, don't read files into memory. I'll talk a little bit more about this in the third part of this presentation. But we're dealing with files here. We should treat them as files. I'm trying to build this theme of compassion. We want to have compassion for the computer itself. We don't want to read a five gigabyte file in the memory and blow our memory and have the process killed. Let's treat the file as a file. Maybe we process it in a very efficient way. But don't read it all in the memory. And especially, don't store it in the database. It's a file stored in either network attached storage or put it in an object store. A lot of cloud providers provide those now. Make sure it's secure. And finally, when dealing with failure, it's not all technical ability. Soft skills will definitely save you in this area. Kind of as an anecdote here, I locked us out of a FTP vendor on a Friday night at 7 o'clock. Most of the office had gone home, not only at lending home, but at the vendor. And so it was very much a scramble for me to, OK, let's think about this. I've done all I can do technically. I'd now need to call someone and figure out what's going on and understand if I can get this back. So this was kind of a concerted effort, but I called up all the people at the vendor, tried to get in touch with the right people, apologized profusely. Oh, I'm so sorry. This was totally my problem. But being deferential, kind of showing them that, hey, I messed up, but I'm willing to work with you to fix this, that soft skill saved us. Only an hour later did we have the integration re-enabled and working again. Soft skills will save you at every aspect of dealing with legacy vendors. So finally, I want to go into the kind of the last part of this talk, which is to be efficient when reading and writing. This is extremely important when you're dealing with file transfer-based APIs because they're dropping files to you. These are relatively large or in a specific format. You don't want to just read it all in the memory. That's what I was saying before about plan for failure. You want to be practical and efficient. And I'll kind of go over some tips and tools for doing that, as well as some very specific code examples. Computers are pretty good at files. Files are a pretty good abstraction. It's just kind of a blob on disk. FTB is a great abstraction for just transferring files between servers. There's no reason to read the file entirely. There's no reason to store the contents in a database. Computers are pretty good at dealing with files in their native ways. In a lot of legacy vendors, you will receive files in either an XML format, perhaps a fixed-width format, or even sometimes CSV. And there are also other proprietary binary formats, too, but I'm not really going to get into that so much. But be efficient when reading and writing. When you're dealing with XML files specifically, you want to use what are called streaming parsers and streaming writers. Essentially, you want to read the input that you're given, either one line at a time or one token at a time, depending on which file you get. And for XML, Nokogiri actually has a wonderful streaming parser for XML. It essentially reads a XML token at a time, yields that to you, and you kind of builds up a linear parsing view of that XML file. And it's a little complicated, and it's a little bit of work, but this saves you, especially when you're given a giant file that you know will blow out the memory stack. And it even is more efficient on the much smaller files as well. So be sure to do stuff like that when you're dealing with files that you know to be large. When dealing with streaming writers, for Ruby, it's actually very interesting. Out of every same Google Ruby XML library, almost all of them are parsing-oriented. Very few are write-oriented of the ones that are write-oriented. Only one has a streaming writer. And that's libxmlRuby. Every other Ruby XML library builds a in-memory representation of the document that you're just about to serialize. So it builds it up in memory, and then when you call 2S on it, it's going to serialize it all out. And that's going to be expensive. Even just holding the in-memory representation is expensive. It may actually blow out your memory allocation. LibxmlRuby just writes a token at a time, and they have a very nice interface for doing this. You just have to be responsible for write start tag, write end tag, write text. It is a little bit more work, but again, if you're dealing with large files, it's going to save you. And there are kind of equivalents in fixed-width and CSV file formats as well. For fixed-width, the streaming parser could just be each line method. Each line essentially reads one line at a time from the underlying file structure. It doesn't buffer the whole file into memory, so you're not going to blow out your memory allocation. Each line will yield one specific line for you, and when you're done processing that, it'll move on to the next line. This is an extremely efficient way of reading fixed-width file formats. And streaming writers, it's funny to think of puts as a streaming writer, but that's exactly what puts does. If you call puts on an IO object, you're passing it some data that it's immediately going to flush out to the file and write it to it. It doesn't care how big the file was previously, doesn't care how big the file was afterwards, it just takes the data you give it and puts it away. Very, very efficient for writing out these fixed-width file formats. And then when you're dealing with CSV files, a little less common, but the CSV libraries have really good abstractions for the streaming parser streaming writers. If you call shift on a Ruby CSV standard library object, that'll yield one parsed CSV line at a time while reading from that underlying file one line at a time. Similarly, streaming writers, the double shovel. I'm not actually sure how to pronounce this symbol here. I call it double shovel, but if you call that with an array of objects, the CSV writer will serialize that, write that out to disk, probably using puts on the internally. But this is a streaming writer, and again, this won't build up any kind of in-memory representations of large files. So, I do recognize that this is RailsConf, and I have been talking very abstractly. I kind of want to dive down into a few code examples in Ruby, specifically because I think this illustrates the point a little bit better. This is how the standard Ruby documentation teaches you to read a line-based file format. You call io.readlines, you give a test file, and then if you need to process each line, you map over it, and you have a magic function. The magic function is essentially your business logic. Whatever you need to do with that line to transform it or parse it, whatever it is, that's your magic function. The problem with this is the problem that I've been repeating over and over. This loads the whole memory, excuse me, the whole file into memory. This is a problem if io.readlines test file, test file's five gigabytes, this isn't gonna work right, and this is actually gonna kill your process faster than you think. What you wanna do is what I keep reiterating here is something like each line, and what you wanna do is you kind of wanna build an abstraction function over it, I've called it each row here, and essentially what this does is it takes each line from the input, reading it one line at a time, you call your magic function on that line, and then if you need to do further processing, you can actually yield that object, and so as you call each row and you parse it on an argument, you're processing the file one line at a time, but you're doing all your magic on each line. You're not taking the full memory, the full file into memory, doing all the magic on it, and then doing something else with it. This is very sequential, very iterative, and very memory efficient. This is gonna be faster and better, and sysadmin's everywhere, well, thank you. So kind of just to wrap things up here, I want to kind of reiterate that a lot of what this talk is trying to get at is having compassion or understanding for various points of the business and various points of the systems that you integrate with. For build the interface you want, it's really important to have compassion for your fellow teammates, to understand where they're coming from, and to actually anticipate what the problems might be, and heading them off, and making sure that you've actually built a good abstraction layer for those problems. Similarly, plan for failure. That's kind of compassion for your vendor directly. You're trying to build a system that's robust for failure because you're trying to make up for things that they haven't built yet or can't build. You log everything, you save every file because you just assume that their system can't. And when it comes to debugging and making sure that things are working correctly, it builds a better relationship if you're able to say like, oh, I have this actually detailed audit log, like I know that I did this at this time, like what did you guys see on your system? If you have the full story there, that's gonna be a much better vendor relationship. And that's really what you want, is you wanna have compassion and understanding for where they're coming from. And then finally be efficient when reading and writing, the thing we just talked about. This is really compassion for the computer. This is making sure that you're not doing too much at once, this is making sure that processes don't get killed. This is making sure that everything is running smoothly and that you're not creating chaos for other people and specifically for the machines themselves. You wanna make sure that everything goes as smoothly as possible and being really efficient when parsing and writing files, that really helps out a lot. So yeah, just wanna reiterate that lending home is hiring. Again, San Francisco, Pittsburgh and remote. And if you guys have any questions about the talk that I just gave, I can answer any questions that you guys have. Thank you very much. Okay, so the question was, should you log either to a database or log to your standard logging interface? That's actually a really great question and funny enough, when we were building Grand Central, we decided to actually log to a database. So each kind of interaction with the FTP vendors themselves is logged as a database row and it's given a standardized format, all the columns are there. The reason why we decided to do this was in Grand Central itself, if I can actually go back to the screenshot, let me try to find it here. I apologize for this, it'll just be right here. You can actually see activities and activities is a user representation of these logs and the easiest way to do that is to just store the logs in the database and then when you click on activities, you have a full human readable log of everything that happened. And so actually I would recommend logging out to the database and making a standardized format for logging. That's gonna make querying much simpler, but remember, UI UX is interface, we wanna have compassion for our teammates. Building activities is gonna help a lot. That's a great question. So how do we implement testing when testing in a lot of these environments is hard or we have poorly defined specs? The engineer who actually built Grand Central primarily when we're dealing with the vendor who shuts off their server every night, that is actually lending homes only FTPS integration. And so essentially doing the connection part, we kinda were testing in production, so to speak. We were just testing, connecting to that live server. And because we would work a lot later than the vendor would work, the employee who monitors the server, what we decided to do was to spin up our own FTPS server. And this actually is a good strategy when dealing with poorly defined specs or poorly defined FTP integrations. Spin up your own, spin up your own FTP servers, play around with different kind of permutations. It's very cheap and very easy to spin up these things, use different server libraries, use different connection libraries. I can really only recommend testing by spinning up your own mock instances and doing it that way. I hope that answers that. How do you handle monitoring and intermediation for failures on our end? So whenever we experience a failure, we not only log it on that activities page, but we also send it directly to Slack. For lending home, a lot of our very important error messages go directly to Slack. So they hit Century first and then we actually have a system that reads Century continuously for various keywords or new errors. And those get routed directly to we have a specific grand central Slack room and they show up immediately and say, hey, this thing failed. We've never seen it fail before. This is a different error. It needs someone's immediate attention. That's the best way I can kind of describe that. You should always be logging errors to something like Century doing that kind of thing. But if it's with an FTP integration, I forgot to mention this in the talk, but sometimes there's a high correlation between the importance of the vendor and how legacy their interface is. And so for when things fail on the file transfer side, you really wanna know about it. So I do recommend something like page or duty Slack, an immediate notification that actually gets the attention of the engineer. All right, thank you guys very much.