 Good afternoon, or I guess good morning, everybody. My name is Brandon Harris. I'm the director of Data Science Technology at Discover Financial Services. And with me today is Anurud. Hi, everyone. I'm Anurud Pahath. I'm a senior manager of Data Science Enablement. I work for Brandon, trying to essentially ensure that our data scientists and analysts have the latest and the greatest tool set, and they're trained on the best coding practices. So I was interested to hear what Shwarad had to say. I think you'll hear some commonalities in the conversation today on what's happening in the open Shwar space and what we've had to do from a financial services perspective. So just real quick, the lawyers win again, right? So what you're hearing today is the opinion of myself and Anurud, and not necessarily discover financial services. And speaking of Discover, I know some of you are probably familiar with us from the Discover card, our credit card, kind of originating the cashback reward space. But we also are a full service bank, so we provide bank and deposit products across the board. We've got online checking accounts, online savings accounts, CDs. We also have various loan options from student loans, personal loans, home equity loans, pretty much a full suite of banking products to make sure that our customers save smarter and can spend more wisely. Thanks. So I want to talk a little bit about Air9. Air9, very quickly, is our internal Data Science Work Bench. It was built in-house at Discover, completely from scratch, and building off, oh, I got a clicker, that'll help. Okay. It was built from scratch, whoops, I don't know if this is working. Just do it there. To solve the number of problems, similar to what Sharar just covered, in fact, it's kind of uncanny, the problems are so similar. One of the things that really drove us to build out a Data Science Work Bench, and there are some good ones out on the market that teams and organizations can buy, but one of the reasons that we decided to strike out and build our own was what I like to call kind of defragging analytics at Discover. And you'll see that in the slide behind me, kind of like our elevator pitch for why we built this platform. The defragging of analytics came about because obviously it was very fragmented to start with. Being in the financial services industry, we've had to be good at analytics and data science for a long time, even before analytics and data science were buzzwords. It's just something we had to do, just like many other companies and other industries had to do to stay competitive. So the effort around analytics really grew up tied to specific business units. We had analytics and statistical teams focused on risk, obviously, and our financial side of things. We had teams that grew up and focused on marketing. We even have analytics happening in our internal audit team. We're actually doing some really cool things with neural networks. But all these teams grew up tied to these specific business units and they have different tools, they have different needs, different capabilities. And we didn't want to force anybody to use the same tool and just said, stop what you're doing, even though you've been doing it for a while and you're good at it, come use this one tool, this one platform, and forget everything you've done previously. We didn't want to do that. There's strength in the diversity of these tools. So we wanted to build a common platform where all these tools could be used and shared and that was error nine. Some of the problems we were solving when we sat down and talked with data scientists and the data scientist teams about these, about the problems that they were solving or the problems they were running into. So the big thing that we kept hearing over and over again was storage space. Obviously, the cloud helps this out quite a bit, but on premise, we had quotas, different capabilities for different teams. Some teams needed hundreds of gigabytes or even terabytes. Other teams only needed a gigabyte here or there. It was very difficult to forecast storage needs and lots of teams were running out of space whenever they tried to do anything at scale. Another big one was just the overall time it would take to take a model from inception and kind of training and development all the way to deployment. That just became very, very long. Ideally, or not ideally, but really because what had happened is we have inconsistent data between our different environments. So a model might be trained in development on one set of data and then go into production and now we're either missing a variable or a feature or the data is just not consistent. And then one of the big aspects that we saw that people struggled with on premise was not having the freedom to install the packages and the tools that they wanted to use. This kind of came about, again, being in a regulated industry, we had a shared kind of multi-tenant environment for some of the tools, mainly around Hadoop. So if a user wanted to use Jupyter Hub, it was on a shared edge node that was talking to our on premise Hadoop clusters and we had very strict controls around what could be installed on that. They couldn't just install a package or the latest version of H2O or something like that. It was very locked down. So it would take weeks, if not months, to get tools upgraded and new packages installed. So that was what we heard and we set out to solve at least the majority of those with our build out of Air 9. And Air 9, where it lives today is really this intersection of code, compute and data. So to describe that, let me kind of walk you through the user journey here of an Air 9 user. When a user logs in, they use their Windows credentials, their Active Directory credentials, they're then presented with a window where they start to think about data provisioning. The first thing they wanted to do was create a dataset to use and they do this through talking to our cloud data warehouse, which is Snowflake. They're essentially browsing metadata tables or our discovered data catalog with the end result of coming up with a query, a SQL query that represents the dataset they want to work with. Most of the time they already have this and they just paste it in there, but they can develop it interactively if they'd like to. Or if they already have a dataset, they can skip this process entirely. So then they hit next and then they look kind of similar to what Schroerd showed, kind of choosing your environment. So you're choosing your tools. Is it H2O? Is it SAS Studio? Is it Python or RStudio? You're defining the size of the environment as well. So size through compute, core numbers, the number of cores you're provisioning, the size of the memory you want to use, and then choosing the tool set and then hitting next. And then in the background, we're creating those jobs that are pushing out the dataset that you need, the environment you need, which is just a pod-based deployment that you've defined, and then injecting the dataset into that environment. So when we talk about code, what I just described was kind of the interactive model. When we talk about code and GitHub integration, this is really how we're doing our scheduled work. So we have Airflow running in this environment as well. And when we want a user to schedule or want them to test a data transformation pipeline over a week or a month, they can use Airflow. And Airflow will actually read the code from GitHub. It'll talk to the Air9 platform at the right time, the scheduled time. It'll instantiate a container. It'll inject the code, any relevant datasets into the container, the model or whatever it is, the process will run, the pod will save the output to a log somewhere, and then shut down. So that's kind of how we've engineered Air9 to live at this intersection of data, code and compute. So for the technical design, we'll talk a little bit about how we've kind of iterated. We've gone from a very complex, messy system on top of OpenShift and then leveraging some AWS-specific components to a much more simple implementation. So this is a slide from our chief architect of this product. He likes to quote Albert Einstein a lot, make everything as simple as possible, but not simpler, right? It was the approach we had taken. So in March of this year, we had a lot of services doing various things, all running in their own pod deployments. We had a Jenkins server with multiple Jenkins jobs for deployments, multiple Lambda functions, tons of step machines. It was kind of a Rube Goldberg machine of things plugged into each other to get this to work. And we've since scaled this down. We've got the entire Air9 application into a single pod-based deployment, all of the services, everything it needs to run. And that's how that is scaled. This is a completely stateless version of this application. And then in the middle layer, we have our operational SQL database, which is just Postgres and RDS. So anytime somebody makes an action on the Air9 UI application, the results are then stored in the SQL database and there's a watchdog process kind of going on watching for new records in the database to take action on anything. And what that looks like at a very high level is what you're seeing here. The Air9 application box on the left is in its own name space. There's a separate Kubernetes name space for the Air9 application. And then the box on the right hand side, there's a separate name space for these tools containers. One of the things I want to call out here is at the bottom is how we're doing persistent storage across all these environments. Because this is really the powerful part that helps drive collaboration across the teams. And that is using PVs and PVCs. We're essentially mounting EFS and giving users home directories as well as team and shared storage directories. So they're able to open up files to work with in, let's say SAS Studio or RStudio, do something, save their output to their home directory or to a shared folder, and then open up H2O or Jupyter Hub and then read those same files or work with them simultaneously. Since it's EFS, all those tools are mounted with the same personal home directories and team folders and all the files and data can flow between those environments. So I mentioned the container life cycle. This is very, very simple slide, very simple concept, but I mentioned it because it was one of the biggest lessons we learned from a self-service perspective is that when you give users, data scientists, the opportunity to choose a large environment, they will take it. It will not provision the one core four gigabyte that you give them and say, start with a small and work your way up. They start with a super extra large and you just get a bunch of long running environments taking up a lot of resources. So very quickly we focused on life cycle management. Our users have the ability to request environments and then see them at a dashboard at any given moment, what exactly they're using, what their environments look like and they can stop them or terminate them themselves. That went a little bit of ways to solving this problem of out of control environments, but what we had to implement eventually was a auto-expiration of these environments. So without auto-expiration, we just saw the environment skyrocketing. They would stay out there forever, they'd never get shut down. So we tied this auto-expiration kind of value, time to live value to the amount of resources being used by the environment. So if somebody had provisioned a one CPU, four gigabyte pod running our studio, it might live for a couple weeks before the system shuts it down. If they happen to go to the other extreme and they provisioned a 128 core environment with a terabyte of memory, it's gonna live for about 24 hours before it gets shut down. And users have the ability to extend these environments once in case they're doing something really important. But it very quickly built into this kind of reinforcement of if you want to use the resources, you can have them, but you better give them back right away because other people need to do their work as well. So that's kind of helped us focus on efficiency and how we're not wasting too much money with these kind of long running environments. Then the last slide I wanna talk about is our resource allocation and how we implemented this to make sure we're getting the biggest bang for our buck. And this is very much the over-commit model of the hypervisors of the 2000s, 2010s. Using limits and requests with Kubernetes, we're actually providing users about a quarter of what they're asking for as far as resources upfront. So it's kind of like the fractional reserve banking of compute here. We are giving them, if they request a 16 core machine and 64 gigabytes memory, we're probably giving them four cores and maybe 16 gig of memory. That is what they get out of the box. If their workload scale up and need to hit that limit, they can certainly do so. And the Kubernetes resource allocation and scheduling helps with that. But this is how we're able to get a lot of density out of these pods and containers. Underlying this fleet of compute is really just I3 instances or R5 instances running on AWS. So we wanna make sure we're efficient with our spend on that. So getting as many of these environments onto a single EC2 instance is important. And limits and requests are what help make that possible and help keep the cost down. So I think that's the last slide from the technology perspective. Honorroot is gonna talk a little bit about kind of the softer challenges with Air 9 and what it looks like to onboard a bunch of these users. And now that you've built something, how did we get them on using this platform and what did that look like? So with that, I'm gonna turn it over to Honorroot. Thanks, Brent. So Brent touched on a few things here. Essentially, when we build tools for the internal companies, we often overlook a few things, right? User experience, user adoption, or even business value. That's kind of overlooked when you build internal tools. But again, we work at Discover, which is one about five JD Power Awards and customer satisfaction the past six years. So we wanna make sure we have the customer at the center of everything that we do internally as well. So we started asking the question of, again, what do data scientists need? Their asks are not unreasonable, they're not irrational. Essentially, they ask more simple things. So we started sitting out with our guiding principles, essentially saying, how do we build a slick, neat, easy to use UI, UX, so that they can get onto it without a problem, not having to learn it over for like a week or two weeks. We want them to get running right away. So we build out a slick little UI, and then we want to abstract the tech complexity from it too. Essentially, the customer or internal customer, in this case, are analysts and scientists. They don't have to know about Docker. They don't have to know about a container either. All they have to do is to be able to provision the container that they want and get running with it. So we abstracted the tech complexity completely from the application itself. And the other two things that are similar in other industries as well is help and latest versions, right? Again, help is hard to find, especially if it's a large company you don't know who to talk to, you don't know where to go. We want to centralize all that to get in one single place that you can ask questions, you can get responses, and then go to a page to even do some self-learning as well. Latest versions, again, a brand touch on it a bit, is the complexity in a larger company like ours, especially in financial industries, is difficulty to get the latest and the greatest version right away. You have to go through a series of approvals, go to procurement, and then get the latest tool. We want to abstract them from that as well. And that is actually driving value right now. So what we see today, that again, one of the examples that Brandon gave you, essentially, you can pick and choose the tool that you want to be SaaS, H2O, R, or Python. Again, a financial industry problem, we inherited a lot of SaaS usage by default. We don't want to give that away, right, again. If you have to ask them to completely stop what they're doing and restart everything and say, R, H2O, Python, that's going to take us forever for people to adopt to our tool. So our solution for that was to collaborate internally. And now what we're seeing is essentially, if I'm a Python user, I can wrangle my data in Python. And so my teammate is an R user, he can come back in and build a model in R without me not requiring to learn the same language. That actually brought a lot of skill sets together. So we have SaaS users, R users, Python, H users, kind of collaborating in the same tool without having to change their ways again. Our eventual goal is to change the entire company and how they operate on coding practice. But again, to solve for adoption, we had to go with the approach of, let's bring them all in into the same tool set and then kind of teach them from there. So Teams is our internal communication tool. And I know, especially this area of the country is a divide between Teams and Slack. I have my pen and steel pocket for now, but we had to do with whatever we had. Essentially, now we centralized all of our help inside of Teams in one place. And what we started seeing again, this is normal with any company too, is we have more work than the people that we have to do. So we outsourced or crowdsourced our help community. Essentially, when we brought them all together in one single tool, now we started seeing collaboration that we didn't really expect before. When people ask questions, we wanted to stand up Teams to answer them, but now people actually are solving people's problems. They're answering. So we see different departments coming and pitching together about ideas of how to solve things. So people are helping people essentially. So we take all the chat that we get in the Teams channel, we kind of put that together into our neat help doc page. So we have all the FAQs sitting in one place so that now if I have to get started on a project, I can do that right away. We have sample scripts, referenceable codes that I can get started right away. This essentially made it super easy for us to onboard users. Again, when you're talking 20,000 person company, there are a lot of people in the company that need to train and onboard on. And it's not simple if we just do it one by one in each training class every single week. So we wanted to get everything, all the FAQs, all the help pages together in one spot. That kind of helped us get the adoption really high. So the product itself has launched in about March, about six to seven months away. And now we've gotten about 60% of all the analytics users in company on this platform already. That includes about greater than 90% of all data scientists are in the community. But when I say analysts, they're mostly, again, any person that uses touches data to provide something of value, I call them an analyst. We've touched about 60% of analysts right now. We hope to complete an adoption of 100% by mid next year. That means every single person with their tooltips right now can provision environment, run the code, deploy it, visualize it in a matter of seconds. Again, we measure ourselves on this as well, essentially how we can get to the maximum of our people in the company the quickest way possible. We're getting there again. Brand and I are extremely passionate about delivering products that actually create value and power our internal customers so they can do what they do the best, which is make customers happy.