 So, got my name, Corey Lachkowski, I'm with what's called the Upstream Integrated Services Technology Enablement Group at ExxonMobil. So I'm going to disclaim really quickly these slides were recycled slides that were approved by legal, so it's going to be a little painful for this first part going through that. I'm going to tell you a few stories that don't necessarily have slides. So I'm grateful to be here, I'm grateful for Exxon letting me come and share some of these experiences and also Red Hat for inviting me. If you haven't heard of ExxonMobil, we're sort of a large organization. Some people at our core would say that we're actually a risk management company that happens to deal in oil and gas. We take safety very seriously and that builds a culture around that. So, a brief intro to me, I've been with ExxonMobil for about 11 years, maybe 12 now. I've lost track. Moved around a bit. I was an active directory domain admin for a research company for a few years. I eventually moved into HPC, high performance computing at ExxonMobil, where I was focused on large data processing. I became an RHCE and ironically I don't think I had any Red Hat subscriptions I was actually managing at the time. Moved to cybersecurity. I was an SME for internal digital forensic cases, did log aggregation project and worked with Hadoop, Splunk and some other technologies there. It would have been a really interesting use case and looking back at it, we had a data breach analysis that we had pulled in and machine learning would have been really fun with actually some of that stuff. Two years ago, I was pulled in as the platform architect for OpenShift. Why did I leave cybersecurity? It sounded really cool to work with Kubernetes and work with OpenShift. Just curious, how many people in here started their journey with OpenShift before version three? Anybody in here? Man, you are a brave soul. So we started out around version 3.5 and this was, I'm going to talk about this a little bit more later but that's sort of where we started out. I came on to a team that was an agile team, it was part of a digital transformational effort and we did everything from the full stack, hardware to onboarding of customers. One of the big wins there that I want to share is what we did is with such a large organization there's a lot of overhead and processes and one of the things that we did is we stood up an OpenShift instance in a cloud provider with GitLab and we used GitLab as basically authentication provider and we said if you have a company email address you can go and use this and that was a huge one. We saw early adopters really easy by doing that and we also, people were really happy about that because it didn't require the same approval levels that they usually had to go through. Also one of the big wins there was partnering with Red Hat. We didn't internally have the experience to pull this off and so we had to partner with Red Hat and also pull in contractors to build this team. I guess you could say we also had some accidental wins. We got lucky, we got some really good people that worked really well together. Also as an architect and a lead engineer I got to see the onboarding process so that iterative agile DevOps approach was extremely valuable to see all the use cases. There's also some less successful attempts for onboarding and that's sort of where the story is going to start as far as machine learning and data science is what happened is, so let me explain there's actually a few data science groups within ExxonMobil. One specifically that I'm talking about today is what's called the upstream organization. They were integrated during some restructuring and by the name of Audrey Riznick she partnered with Red Hat contractor who was a full stack developer to help with some of this work. Okay, yep, we're good. In this picture you'll see there's a Jupyter notebook running. This was what I like to call the beginning of the Snowflake factory. We started, you get a Jupyter notebook, you get a Jupyter notebook, everybody gets one and so that started not a very consistent experience and when you're doing machine learning you want to have reproducible results and so that created a few challenges. So what was worked on was to bring that, the goals there were to create an interactive reproducible and collaborative environment for the data scientists and so Jupyter notebooks were selected. This was, as you're seeing here, the main thing was people were running these in the HPC environment and Linux and Windows on port 8000, whatever, like it was all over the place. So the goal there was to actually move from this local PC environment to more of an open shift and so this was a huge win for the data scientists to start having this standardization. It forced a lot of different things as well and inherently made more of a DevOps approach to things. People, I'll probably explain it later, another example of some lessons learned there but this accelerated a lot of the POCs that were being done. So this is sort of the model that they went through more of an agile model and then pushing code. One of the big things was using S2I to deploy these proof of concepts and then have a way to demo that and then giving feedback from that. So one of the, a big win here is also that because open shift was risk assessed, a lot of the controls had already been documented, these data scientists didn't have to go through that process which was traditionally sort of a large overhead. Also RBAC access to certain data, access to certain models was simplified through open shift. Also security, so that was a big one as well as bringing in these dependencies for your model and we had a security pipeline to bring in the artifacts for Python that were brought into an access and then also the images that came out of some of these base images for the Jupyter notebooks were part of that image or into the Nexus repo. So this, we did a few things here. One is we took, we took some situations that would generally take anywhere from months to deploy using waterfall and some of the overhead of our internal procedures and changed that down to minutes. So before this effort first started between the data scientists and developer, I think they were producing one or two proof of concepts that were actually getting to customers. Right now I think we're around 70 plus POCs that are being produced by the same group because of using, one big thing is sourced image. So not only the Jupyter notebooks but also using sourced image to deploy these POCs. So reusing data and I also connections to data, that was a huge one, reusing these images. A lot of the connections to data were through SQL, server or Oracle and so having those drivers already built into the images was very helpful. Another thing that we looked at recently working with Red Hat and Will Benton was the idea of doing a sourced image model training. So actually doing your training during the build process. So we're also seeing a CICD pipeline maturing. We had a few things that we learned from this in trying to solve this. One of the biggest problems is where's your data? On-prem databases in various countries and certain agreements limited where we could move data or access data. Development and deployment in Jupyter notebooks while it was much better. There was also sort of some lessons learned in context that had not been captured during that. For example, here's a horrible story to tell but somebody was using Jupyter notebooks on their local machine and they moved to OpenShift. They started working on that. They had done a lot of work and they're pod scaled down. Someone hadn't told them about persistent volumes. So you can imagine how that sort of was a very painful but good lesson. So that was sort of a journey in itself too, understanding that context. Also one size does not fit all. We found that data scientists are very special and one size does not fit all. So we tried to do that and we found out some of them just wanted, if you're thinking about MVP model, some of them just wanted shoes, they didn't want a skateboard, they didn't want a bicycle or a car. We're trying to get them all to work on the bicycle. So again, we just focus on basic fundamentals which are like webhook integrations and using integrations with Jenkins and OpenShift. So here's where some of the things that I can legally share with you. This image here is actually some of the flow modeling that was done with machine learning. So these, understanding what the well flows are going to be over the lifetime of the well. Using TensorFlow, PyTorch, PsychicLearn, all those libraries and dependencies, building them into the Jupyter notebooks and then other Python base images. Currently we're looking at actually using GPU and on OpenShift and seeing the benefits there. There's stuff that I've seen way more interesting presentations about but I can just say we're looking at that. We're currently in the process of working that. We're also looking at RAPIDS, a lot of the work around NVIDIA and the efficiencies they're bringing to that. So some of the proof of concepts that we're working through is petrophysical models. So we'll look at the physical attributes of various layers in the Earth's crust and process those in a 3D model. And that's one of the use cases. So these are some of the efficiencies that we're finding here are not only just, we're not just optimizing slightly, these are millions of dollars if not billions of dollars that we're finding that we can save or at least avoid and cost in certain areas. Natural language processing, we're taking a lot of technical text and trying to create a repository or library around that and also share, it was talked about earlier, sharing those machine learning models as APIs through Open Data Hub and stuff like that. One of the lessons learned also with GPU and this may or may not be applicable to you but hopefully it is, we found that because GPUs are billed at a premium in the cloud, you may want to do an analysis to decide if you want to just buy that new rack of GPU servers every month instead of paying a cloud provider to run it for you. We had several budgets that were burned through from some of the data scientists running their models in cloud. You also learn to train models locally if you have the resources, it saves you some costs and then run your trained models as spike work in the cloud. Also, some of this is going to seem really straightforward and simple but as some of our data scientists they're very focused on the problem and so they don't always step back and look at the big picture and so hybrid cloud requires you to look to step back a little bit and look at that. So we broke it down into some really basic questions when we go into some of these discussions, where is your data? Because the question that was being asked by the data scientists was like where do I put this? Do I put it in the cloud? Where do I put my data? Where do I run my application? And so we broke it down into these three questions being what is your data and discuss data sovereignty? Where are your customers? Are they internal, external, or are they mixture? And then what is the bandwidth or latency between these elements that you need in your system? So here's our safe word clouds slide. We all want to be doing these things. This was just part of the slide deck that was approved so we're going to skip that. But how do you do that? How do you use those technologies effectively? So this is one of my personal focus areas. I believe that success is not about doing things perfectly, it's about willingness to change and being honest about where you are. Ultimately this is far more important than what your current abilities are. So one of my favorite conversations that I had recently, we're in a room, very intelligent people. I'm pretty sure I was the only one without a PhD in that room. And someone asked why is it called a cloud? And I was like okay. I looked around and I was like does anyone else want to fill this question? And I realized nobody in the room actually knew the answer. And I was like we should just Google this, guys. But no, it actually turned into a really good discussion. And I talked about back in the 90s doing network diagrams and how it was abstracting these details and it was just abstraction and trust. And that's what cloud was all about and also it's really easy to draw. But it sparked a conversation that was very helpful and realizing that we're never too smart to learn more. So talking specifically about some of the effort and some of the lessons learned that came out of some of these first part of this journey was that I'm part of, again, part of the upstream data science enablement team. So I was platform architect, I have been moved over because of this unique set of skills or experience that I had to try and add context. So my team is specifically there to fill in these gaps in knowledge and on to culture. So these are big gaps and if we have a lot of non-IT engineers at ExxonMobil, gaps in machines create failure and we want to avoid that in our culture and in these efforts with data science. So consulting with the data science, we spend probably 50 to 60 percent of our time doing that right now. We also, in doing that we also get a lot of really good feedback which turns into education. This turns into building success skills is what we've termed it, which are really just developer practices. A lot of data scientists at ExxonMobil did not come from a development background. They didn't grow up in that world and so they're not familiar with get branching. They're not, some of them aren't even familiar with get, they're that smart. Like it's a challenge. There's a wide spectrum of people we're working with. Another one, big one, is collaboration and partnering. One of the, listening to one of the internal talks at ExxonMobil, we were looking at the number of patents that were released over the last decade and it was very low. And we realized that collaboration is something we don't do well. And so we're focused on doing that, not being a lynchpin, like our team's purpose is not to be a lynchpin in collaboration but to be an enabler and to get out of the way of that and to help it organically happen. We're also focusing on self-service. So that was a big one. We want enough of a paved path for our data scientists to be able to use these tools. And that was why Jupyter Notebooks came in for collaboration with other people. That's also one purpose that as our team, someone asked us the other day, so what do you guys really do? And we're like, well, we force awkward conversations. And that's literally what we do is we come in and people are like, well, can you help us? And I'm like, sure, but can we do a peer review of your code? And they're like, well, I usually don't share my code with anybody. I'm like, we're going to look at it. We're going to see if you're using modules. Like we're going to see if you're using best practices. And so we force a lot of this awkward conversation that helps to change that culture. In addition, we have awkward conversations with other groups in the organization to help provide features around, for example, bringing GPUs in. Or saying, we need OpenShift 4 for some of this work. Tell me about operators. So we force these awkward conversations also with other organizations within the IT org. So some of the ways that we've discussed in retrospective is around what actually builds a successful enablement team. So one big thing is we leave our egos at the door. If we don't know something, we tell someone that and we go figure it out. Or we figure it out with them. We're also full-stack developers, so we work on making others successful as a team. We demonstrate what's called healthy disagreement. We all have very strong opinions at times, but we know how to disagree appropriately. And we demonstrate that to data scientists that inherently don't collaborate. They're scared to, or it's just not natural for them. So we try to be a good example in that area. Also to give you an idea, there's four people on this team. And we have about 90 data scientists that we're supporting in no ways that the perfect ratio don't take that back home with you. We definitely have a lot of work that we do, but we've seen a lot of success even with a small number of people. And so here, this is basically the legally released picture of some of our data scientists. So this is them collaborating around a Jupyter Notebook. One of the things that is the best to see is to see when they really understand these things and when we can answer those really simple questions. Why is it called a cloud? And talking about the fundamentals and context around OpenShift. It has been a huge enabler for our data scientists. And I'm glad to be sharing that with you. If you have any questions, I hope we'll talk later. But thank you. Well, we'll...