 My name's Corey Lachkowski. And just a quick poll, how many people in this audience are with oil and gas industry? You guys are brave. Good, thanks. How about the rest of you? How many of you are in a large, complex organization? Awesome. OK, how many of you want to create a large, complex organization? OK, good. Yes, exactly. So to lead us off today, we're going to talk about how we've empowered individuals in a large organization through technology, such as OpenShift and Kubernetes. And I'm going to let Audrey kick us off with that, telling us this story. OK, so I'll tell you our story. So my name is Audrey Resnick. I'm a data scientist. I work with the Computational Data Sciences Group within ExxonMobil. And as a data scientist, probably two years ago, there were just three of us in this group. And what we're trying to do is saying, hmm, how can we get our proof of concepts over to our customers? And this actually was a challenge for us, because we'd come up with these really great ideas, and then we'd sit there and look at them on our screens and go, damn, I've got to go and I have to go and change my user's environment. I have to do some PIP installs. I have to make sure that they have the latest code base. And that wasn't really a good way for us to actually get our proof of concepts out to our users. I mean, who wants to sit there and install a lot of software when you just want to see if the problem that you have actually can be solved? Or is it a problem that you should even be looking at? So we started kind of our search. And we first started with Jupyter Notebooks. And we said, well, this is a good way. We have documentation. And we can actually release the code, but we're still stuck. There's no way that our colleagues and the customers that we have are going to be very patient to go ahead and sit up in an environment. And it really was a drawback. So in having lunch with one of my colleagues, Chad Furman, and we're going to have to get buttons, friends of Chad, because he goes around ExxonMobil and gives all these ideas. And then some of us end up here on this talking tour talking about the stuff that we've come up with. Said, well, why don't you take a look at OpenShift? Because basically what you can do is take your entire environment and create a container. So instead of worrying about giving people local admin access or worrying about the latest source code or even worrying about some of the dependencies that you have, you can contain this in with an atomic unit that you could go ahead and deploy. And we sat back and said, that's really a great idea. Actually, if this works for us, we would have something that is both reproducible and interactive, save us the trouble of flying to some of our colleagues in Calgary and setting up their laptops just so they could see the latest solution that we have for them. The other thing that inadvertently it does, and I'll talk about this later on, is it really ties really well into the agile process so that we could go through our code a lot more iteratively. And we could really quickly push out those minimum viable products. And I'll kind of give you a taste of that right now. It's last year at this time, we pushed out two minimum viable products because we had just started using OpenShift in earnest. This year at this time, we're already past 70. OK, that's 70, 7-0. That's huge. So my goal last year was, as a data scientist, I want a data science environment for myself and my colleagues and my users. There's interactive and reproducible and can give me some sort of collaboration. So what we ended up doing is saying, let's get away from this snowflake type thing and create some sort of workflow where we could have our code that would be coming out of a get repository. And for, can you guys still hear? Yeah. For some of the scientists, this was a new thing. Their idea of a get repository was their age drive. And that's a challenge. You have to get folks to determine, how are we going to work together? And OpenShift actually allowed us to do this as well, too, because we could then put a workflow together. We would take the code. We would take it out again. We would build an S2I Jupyter notebook, push it to OpenShift, and then basically have our users or our colleagues be able to hit a URL. And that was huge for us, because that's that interactive, reproducible, and collaborative environment that we wanted. So what we did last year is we took it that one step forward. We said, hey, data scientists, we're going to use agile methodology. We want to be able to talk about a problem that you're developing. We have this way that you're going to go ahead and deploy your product to your user. And by the way, we're doing this using an OpenShift environment. And guess what? Your user or your colleague has the ability, just by clicking on that URL, to go ahead and take a look at your product. For us, that's actually groundbreaking. It may not be for many of you, but for us, this was huge. Because before this, if we wanted to deploy a proof of concept, it would take us three weeks on something that we called quick app delivery. That's not quick. Three weeks. And if it went bad, it would take a month. And then if we use the standard process that our IT department had, we could go from one month to three months. So being able to code something and as fast as you could code it and deploy it and give it to your users was just huge for us. The other thing that we brought with us as well, too, is with that interactive feedback, it really, as I mentioned, tied in really nice into the agile development where we could then see, is this solution OK? If it's not, we could go ahead and recycle and then keep on going. And for us, as I mentioned, that was just very nice. And then the other thing that we have with this data science delivery model is in that process of working through agile, if we determine that we have to create some sort of external, not external, extra algorithms or if we have to kind of change direction a little bit, how are we going to add more packages or libraries? ExxonMobil has this thing. You don't want to go ahead and download anything into our environment because we're worried about having malware and attacks. So what we have is an actual security portal where we tell our data scientists, now, if there's a package or library that you like or if you're working with Stanford or Berkeley, go ahead, get the packages, the addresses, give it to the security team. The security team is going to go ahead and help us download these packages and put it within a nexus repository. Therefore, when you go ahead the next time and you want to actually use this package, guess what? We're not going to have 20 instances all over the place. It's going to be in one central location where we can actually get the exact package that everybody would be using. Again, for us, that was another huge win because I can attest to, as a data scientist, three years ago, just the three of us, we had multiple versions of PyPy, we had multiple versions of NumPy, and we could never agree to which one was correct. Some of you are laughing, yeah, so you understand that. So in having this data science delivery model and then being able to get these external packages and pull them in nicely made this environment something that was very valuable to our data scientists and optimization engineers. And I'll just quickly go through this because the other thing that it did is it actually turned some of our data scientists and optimization engineers into developers. We don't tell them that. We tell them that with the OpenShift environment, we have a platform where we're helping you develop success skills. So when they go ahead now and you see that figure with the developer, just think of that as a data scientist, we're saying, well, this way we're helping you easily store your code somewhere where somebody can get it. And it's being automatically created into a source to image. And guess what? You don't have to worry about finding those images because we have it within an image registry. And you know what? We can easily deploy it. So unbeknownst to them, they're working in this methodology. We have over 40 data scientists at this point in time that are actually using this source to image and actually are becoming developers. But as we tell them, you're developing these great success skills to make you a better data scientist by being able to release your proof of concepts very quickly. So let me just switch gears a little bit and talk about the machine learning that we have at Exxon. One of the examples that my group particularly works with is we take a look at optimization and surveillance. And the chart that you're looking at right there is just a machine learning model to protect the well flow. And within the OpenShift environment, one of the things that we looked at and kind of struggled with at the very beginning was if we were going to go ahead and work with the data scientists, we didn't want to have over 70 different containers. So what we quickly did is looking at the different types of problems that people looked at or worked on, we came with kind of three images that we kind of used between our data scientists and that we also give out to the rest of the Exxon mobile scientists when they want to use them. First one is just a basic image where if you're taking a look at a data set for the very first time, you're probably going to use something like pandas or NumPy, create a number of data frames and just take a look at your data and make sure that it's clean. See if it needs to be cleaned any further or even see if that data has anything to do with the problem that you've been assessed with. The second container that we built or standardized is something that we call an intermediary container. And that's one where we've already kind of curated the data. But now you're probably going to do a couple small machine learning problems with it, create a couple models, kick the tires on your model, see how well it works. And then finally we have an advanced container that we're building that only a very few of our data scientists are using right now. And that's because this container we use with some of our GPU work that we're doing, which is still the proof of concept. So for those that are using PyTorch or TensorFlow and taking a look at some of our more advanced models, they're going to go ahead and use that third container. So while I'm talking about machine learning for us right now, we're doing our final setup. I'm really hoping this is the final setup because we've been having fun with this since May. We have our final GPU cluster. We're using some NVIDIA V100s. And we have also some internal services with our high performance computing center where they also have some NVIDIA 100 cluster set up. And there are really two proof of concepts that we're working with, and I'll talk about them on the next slide. But at this point in time with those proof of concepts, those are more to vet out the containers that we have to make sure that the containers are going to work well for a number of the other machine learning and AI problems that we have through the company. So these are the two proof of concepts. And again, we're trying to create a model where the data scientist is basically going to go ahead and any of the algorithms that they develop, we're trying to encourage them to go ahead and put them within a get repository. We would have a number of containers, such as the GPU container that I was talking about that we're developing, and we have a number of database containers for SQL Server and for Oracle to actually access our on-prem databases. And from that, be able then to again produce proof of concept where you could then send a URL to any of your colleagues or any of your customers. The two problems that we're working on, one is just a natural language processing problem where we have one of our manufacturing sites getting a lot of reports in and they just stack the reports on the desk because they're paper and it just keeps getting bigger and bigger and soon you open the door and the paper slides out because they can't get to the paper fast enough. So we're seeing how we can actually take that process and actually digitally transform that. The second proof of concept that we're working on is more of what we call a petrophysical process and we're just looking at petrophysical data. So for those of you who are not geologists and I think I'm the only geologist in here plus I guess I'm a data scientist, software engineer, jack of all trades, whatever. That's rock data. So we're looking at the porosity, the permeability and we're determining can we take a look at a number of our reservoirs from all over the world and see if we can make matches and say based on this type of reservoir we see these types of characteristics. You know what? When we had a field in this type of reservoir we produced X amounts of hydrocarbons. This other reservoir looks similar in terms of the permeability and porosity and some of the other characteristics. We might take a bit on that one and see if that one turns out the same. So those are the two proof of concepts that we're working on right now. With that I'm gonna hand it over to Corey and I'm just going to mention last year there was myself and just one Red Hat contractor and as of two months ago we actually were able to create an actual enablement team for our computational data scientists and Corey is the team lead for that team. Thanks, Audrey. How are you doing on time? Good. All right. So yeah. One of the things that I guess Audrey sort of tricked me into was to head up this team that's called the enablement team. Thank you. So one of the things that our purpose is to do is to create appropriately awkward conversations with the data scientists. We do peer reviews which they generally operate by themselves focused very specifically on their machine learning or their models. So one of the things that we talk about with them is creating a pipeline either automation. One of the things that we specifically go over with them is that one size does not fit all. We tried to actually give them a solution so if we build it it will come. They did not like that, it didn't fit their needs and we found out that a lot of these things are very quick and iterative. There's a huge waste basket of ideas and so we just simply did enable them to use webhooks with S2I. That was the simplest solution that they were very happy with. Another big thing that we learned, we had one project with a months budget that was burned through in about three days using GPUs and a cloud provider. I won't tell you which one. But basically this is one of those things that when you're looking at GPU training or any kind of GPU use, decide whether or not it makes more sense to actually just buy a rack of GPUs every two or three months instead of putting it in a cloud space. That was one hard lesson that we learned. Another one is a lot of the data scientists are very focused again on specifically solving a very niche problem. So we've had to sort of bring them out of that mindset and we do it by asking really simple questions usually. These are some of the questions we ask. So where is your data? We talk about data gravity and we help them be more aware of that. We also ask them where are your customers? Are you working with internal customers within the company or are you working in a hybrid of external internal? And also what is the bandwidth or latency between these various elements in your systems? This is sort of an overall architecture that we're helping them to think about. This is sort of our stack I guess that you would say. And up in the top left is talking about cloud ready applications. So we try to help to facilitate the data scientists to think as a developer, to break these, a lot of times it's a single Python script that they've like it's 2000 lines of code and whatever. And we're trying to break that out, make it more modular, help them to think about collaborating with their peers. Again, we create that awkward conversation of, well, how are we supposed to support that? And that really helps them to sort of get out of their own heads with this. So specifically on the team, one of our personal focus areas is not necessarily looking to try and do something perfect. That is an ex on mobile thing. We've always wanted to do something flawless, but this is a cultural thing we have to sort of break out of. So success is not about doing things perfectly. It's about willingness to change and be honest about where you are. Ultimately, this is gonna be helping you to be, sorry, ultimately this is far more important than anything else that you'll do. So in the upstream enablement team, our focus area specifically around what we're delivering is consulting. That's probably 80% of our time right now just because we're trying to bring people up. This is technical debt, but in the people and skill set area. Education, so a lot of these consulting engagements become education. We do workshops that have been really helpful, just teaching them we have various layers in Git. So we teach them how to use Git as an individual, how to use Git as a team, and then how to use Git to collaborate externally and just look at the bigger picture. The other thing is all these things that we're doing either lead to collaboration or partnering with organizations internal. Ideally, hopefully we'll get to the point where we can collaborate externally as well. We're working on that. One of the things is Jupyter Hub. So we're looking at Open Data Hub and Jupyter Hub as one of those enablers for self-service. And then bringing GPUs to the masses. So right at the bottom here I'm gonna turn it back over. Audrey, she'll talk about sort of why we ended up where we are and this is the user story. Right, so again I mentioned that last year at this time we had two proof of concepts and these were for our clients up in Calgary specifically within the Curl Mine. They had a number of trucks that would deliver material around the mine. And you can imagine if you have 30 or 40 trucks on one road and these trucks are weighing a couple tons you can imagine that the road is going to degrade. So one of the problems that we were given is if we give you a starting time for these trucks and we say when they're picking the loads up and where they're supposed to go can you optimally create some sort of system where we can randomly put the trucks in different roads to make sure that one they get to the dump location or to an actual crusher location and they're actually taking ore to that portion there. So we either get rid of the ore or we crush the ore finer. Can you actually devise some sort of model so that we can go ahead and get these trucks to where they need to go? And as I mentioned we were able to do that. Now some of the savings for that, I'll give you an example, is in one instance they said well we need more trucks because we see that we're not getting a lot of the stuff delivered quickly. And we said okay well let's go and run through some of our analysis and what we found is they didn't need more trucks because a lot of the trucks were actually waiting in line, burning fuel, going ahead and increasing our carbon input if you want to say well not really that but you get the idea that we're able to say no don't buy more trucks, we're gonna have a better way of actually going ahead and telling the trucks which location to go to. And I think another example I'll give is also with graders with some of the roads that we looked at we said you know you say that you also need more graders, well actually we can show you that 60% of your graders are actually sitting in different locations and not working as efficiently as they have to. So those were some of the items in the proof of concept that came out. One of the things that I think that was really important about this is as data scientists in the research center we also tend to intimidate people. So when we give our users and our colleagues the ability to take a look at a proof of concept like this on their own time so that they can go back into their office or they can go into a meeting room with a number of their other colleagues and step through the application that we've created for them. We're gonna probably get a lot more on a speedback as well. For me personally with this entire journey at this point in time I'm actually very happy because I don't have to go and upload software onto somebody's machine or create a server and hide it in my office away from our normal IT just so that I can deploy a solution for my customers. The other thing that's really great about the platform that we have is it is interactive. It really allows us to be more collaborative and if I don't have to worry about the architecture or delivery mechanisms that a data as a data scientist shouldn't have to I just can do my job better. So at the end of the day I think that's what we think of anyways as democratizing data science. And here are my colleagues. I made them pose for this picture here. They're very happy now because we don't have to cram everybody into a room. However, we can deliver that proof of concept so maybe some of our other colleagues in Alberta or elsewhere or in India can actually group around a computer and take a look at a proof of concept that we deliver. Questions? Well, I think I'm totally inspired and I don't know if you realize my Twitter handle is at Python DJ. So it's just Jupyter Hub and seeing you guys use this just makes my heart sing. So thank you very much both for coming and inspiring us in closing. Wow.