 My name is Mike Peach. I'm Vice President and General Manager of the Cloud Storage and Data Services Business Unit at Red Hat. I am super excited to be here with you all today. I'm going to talk a little bit about how you can accelerate your artificial intelligence and machine learning projects and efforts and initiatives with Open Hybrid Cloud. So let me jump right in. So I always try to situate a talk like this within some current contexts, and sometimes that can be a little difficult, a little contrived, let's say. In this case, we actually have a really fascinating series of events underway right now that is just absolutely rich with inspiration and starting points to talk about data, data engineering, data science, artificial intelligence, machine learning. So unless you're just completely off grid, you can't not have heard that a very large ship was stuck in the Suez Canal for a number of days last week. It is called the Ever Given, stuck here in the Suez Canal in Egypt. This ship is 1,300 feet plus long, almost 200 feet wide, 220,000 tons. It carries 20,000 containers each, ranging from 20 to 40 tons, a very, very big ship. Now, it happens to be stuck or happened to have been stuck in a very important, a very strategic location. The Suez Canal connects the Indian Ocean to the Mediterranean, and just as is very well depicted in this graphic here, it cuts nearly a third off of the journey between Europe and Asia. Let's pick two significant ports such as Rotterdam and Singapore. Let's just look at a couple of quick statistics. 80 percent of the import-export volume, the trade in the world goes by ships, 50 percent by value. 1.5 tons per year per person on the planet. A lot of stuff moves by ships. The Suez Canal, 120 miles long, owned by the Egyptian government, 50 ships a day go through this, again, strategic location, $9 billion worth of goods per day. 13 percent of the world's trade goes through this strategic location. Now, as I was learning about this over the last couple of days, this particular image just struck me, for among other reasons, the sheer difference in scale between old technology and new technologies often just mind-blowing, overwhelming. Now, this particular incident ended reasonably well. Yesterday, the ship was in fact floated through a lot of amazing engineering, as well as a little help from Mother Nature with some spring tides over the weekend on Sunday. But a very important takeaway here, and really this is the setup, is that data can help. Data did help, data will help, data is involved and relevant and critical, both for preventing things like this and in helping deal with them when they inevitably do happen. Data engineering and data science are all critical. Now, if we just think about this whole scenario, what happened with this ship, its impact on world trade, on world business, and we start to think about some of the different kinds of data that came into play. It's mind-blowing, right? So just the ship itself, right? It got stuck because of a sandstorm in 70 mile an hour winds that blew a 220,000 ton ship, slightly off course in a very narrow, very strategic canal and blocked shipping, blocked the flow of $9 billion worth of goods a day for only order of six days. You had the tidal information, oceanographic, you had now ship scheduling and ship routing as different transport companies around the world were scrambling trying to figure out how to deal with this thing. You've got fuel considerations, both for the ships and the fuel that the ships were carrying from various places around the world. You have so many different kinds of specialists and they're scheduling an availability specialist to pilot the ship, specialist to dredge the ship out, specialist in re-optimizing and rerouting, various trade routes and so on. You've got raw materials, supply and demand at a local or sort of more tactical level. You've got macroeconomic factors, black, so on and so forth, blah, blah, blah. It just becomes quickly overwhelming how many different kinds of data come into play in situations like this. And this doesn't matter whether you're the operator of the ship, whether you're the seller of goods that are being transported by the ship, you're the operator of a port. There are just so many ramifications, so many implications, so many percussions of an event like this, all of which can be significantly improved and aided with the right use of data. So clearly data, critical asset, and depending on what type of industry you're in, it can be used in different ways. It can improve a customer experience. It can allow businesses to gain competitive advantage. It can be about a P&L profit and loss, cost savings. It can be about automation, et cetera. And in different vertical domains, in different types of business, there are clearly even more subdomains and specific ways along the lines of some of those things I just mentioned where data can be brought to bear and machine learning more significantly, more recently, can be brought to bear in very helpful and very important ways, whether it's in healthcare, faster, better diagnosis, risk analysis and financial services, optimizing network routing within telecommunications, how insurance premiums are calculated, et cetera. Now, an important consideration to keep in mind is that operationalizing the use of data, the employment of data, whether it's for really basic artificial intelligence, let's say simple rules-based systems, whether it's more advanced, actual, modern, sophisticated learning algorithms, putting all of this together is not trivial. There's a lot of limelight, there's a lot of discussion about specific algorithms and specific technologies. The true sciency stuff is very exciting for sure and certainly a great impact here, but we also need to not forget all of the seemingly more boring stuff that it takes to get all the right kinds of data to the right places at the right time so that models can be trained, so that trained models can be deployed so that there is a feedback loops that there is the right call it plumbing so that a fast iterative cycle can be set up so that learning can really do what it needs to do and in a timely manner. So if you just look at some of the roles of the stakeholders involved and some of the phases of a machine learning and more generally artificial intelligence and employment of data, it is quite complex, right? At the executive level, you're setting high level business goals. You've got data engineers that are gathering and preparing data, putting all of the right platform, the right infrastructure in place to get, again, those various kinds of data to the right place at the right time so that data scientists can then actually sit down and work with models, work with new algorithms, train models, et cetera. But then you have, in addition to data scientists, you've got folks in big companies, small companies, medium companies who are the ones that actually take that output from the data scientists and again, put them in production, put them to work. You've got the work of the machine learning engineers and the data scientists being pulled together into larger applications that are being built by more general software engineers, not necessarily artificial intelligence or machine learning experts per se, but all of these different types of application-building stakeholders need to be able to collaborate together to get an actual full-blown application out and in production. And then ultimately, you've got the ops folks, now, whether you're running an application in a public cloud or on your own premises in a data center, you've got folks that need to keep the lights on, you need to keep everything up and running, you need to handle backups and restores and all of that good stuff. So let's quickly, so we've talked about a couple of challenges already, but let's just highlight a couple of additional ones, right? So the data itself, as we already touched on with the shipping example, there's the volume, variety, and velocity of different kinds of data is of a scale that is like never before. And that is overwhelming old ways of handling data, right? So to do modern machine learning, one needs different architectures from what one needed in the past. As with anything new, you've got a scarcity of expertise to deal with this. The tools, many of the tools are brand new, they're nascent, so there's not as established, let's say, learning and training around these things. So if we look at the architecture and some of the technologies that it takes to address some of these challenges that we just discussed, right? If we take that workflow of setting goals, gathering, preparing data, developing models, et cetera, we have tools that many of you are familiar with such as TensorFlow, Spark, Jupyter Notebooks, et cetera. We have more, let's say infrastructural type augmentation of that or support for that with data pipelines and various data constructs such as data lakes. Now, at the center of all of this, the next layer down, the platform layer that is critical for enabling the kinds of speed, the kinds of scale, the kinds of reliability that we need to do the kinds of application development and running of machine learning enhanced applications in production, this is where a cloud platform, this is where containers and the architecture that containers enable, such as microservices, fine-grain modularity, et cetera. This is what is fundamentally enabled by a technology like Kubernetes and its instantiation and its embodiment in the commercial product open shift. That is itself enhanced and augmented with technologies at an even lower layer such as graphics processing units, floating point gate arrays, I'm sorry, field programmable gate arrays, tensor processing units, various new types of hardware basically to accelerate the right kinds of elements in the kinds of algorithms that we're talking about here. And then all of that is available to enterprises in various infrastructure models, whether it's physical on-premises, virtualized, completely private, public cloud, hybrid, and a theme that you'll see in some of what we talk about here is that everything is increasingly hybrid. So now, with all of that background, let's take a look at a couple of examples, three in fact, from three different industry verticals. We'll start with financial services. So Royal Bank of Canada, top 10 bank in the world, been around for a fairly long time, 86,000 employees, lots of branches, a big bank. We started working with them about five years ago. They were looking to create a general machine learning capability a data science capability to enhance a number of different areas of their business, fraud detection, risk analysis, even marketing. They made some initial attempts and some of the challenges they experienced as they were first setting up the team of on the order of a hundred folks, they were finding that projects took two months to get off the ground. The platforms were hard to build just the sheer wiring together of the various technologies at the disposal of the engineers was just itself very time consuming and distracting and taking away from the time to actually build the applications and security and compliance were challenges as well. They had a goal. They wanted to set up tools and processes for a hundred developers and engineers. They obviously wanted to take that two month project cycle down significantly. They realized that part of their challenge was was actually culture. It was that the old planning waterfall ways of doing application development were not well suited to the rapid iterative type development that one needs to employ in training models and in using machine learning effectively as part of applications. So they worked with Red Hat and NVIDIA in addition to some other technologies. They employed Red Hat OpenShift and NVIDIA GPUs to accelerate machine learning models. Their architecture taking advantage of Kubernetes and the container architecture of OpenShift employed a very fine grain architecture to deploy machine learning models in containers so that they could get that rapid iterative structure to their process. They were particularly because of regulatory constraints and so on wanted to set this up any sort of on premises in their own data center. The NVIDIA technology was significant in speeding up what was initially some performance challenge to applications there. So the results, they've already done on the order of a thousand models with the setup they've built over the last couple of years. They were able to do 10 times more experiments per unit time than they were able to do with their earlier setup. They took that two month window, that two month life cycle for projects down to a number of days. And in one particular example of a project they were able to analyze the records of 13 million of their Canadian customers in 20 minutes. And given the complexity of the calculation going on there that's actually a pretty phenomenal number. So let's jump to a different vertical, a different domain with certainly different drivers and different constraints, healthcare. So HCA Healthcare, Private Healthcare Company based in the U.S. Also been around for 50 plus years, a couple hundred hospitals, 2,000 care sites across the U.S. and as well in the U.K. 50 billion dollars in revenue, number 67 and Fortune 500. So a big healthcare organization here in the U.S. 280,000 employees. Healthcare in terms of employees per unit of work or per revenue, et cetera, tends to be a very sort of people intensive business. They set out to address a particular challenge. Now we've all in learning about machine learning and artificial intelligence in general, diagnosis of medical conditions is a use case that comes up fairly frequently. And it's fairly intuitive, fairly easy to get one's head around how one can throw machine learning at the basic problem of input, set of symptoms, output, set of possible medical conditions. They were addressing in particular the disease called sepsis, which is a disease where a person's immune system overwhelmingly reacts, essentially over rotates in response to an infection to the point where that immune response starts to actually do more harm than good. It literally damages organs in the body. So it's a disease that, among other characteristics, it spreads and it does its damage very quickly. So time to diagnose is absolutely critical. In HCA hospitals and facilities, the diagnosis of sepsis was a very manual process, literally nurses with clipboards. And the knowledge about how to diagnose it was also spotty. There was better knowledge in some places than others and that needed to be well addressed. So they, again, HCA set out to address this very specific problem to automate and normalize this diagnosis across all of their vast properties and give every possible diagnosis instance the benefit of the best possible diagnostic technology, diagnostic knowledge, right? So smoothed out that spikiness and not let some patients be worse off than others because they happen to be in a place with less knowledge. So they employed OpenShift and set up an environment where their data scientists could gather their existing data, set up an initial model, roll it out to an application that nurses and doctors would use instead of those clipboards and the previous much more manual process. And they significantly sped up and improved the results of diagnosis of sepsis. And here's a quote from the Chief Data Scientist, the HCA. They provide a five-hour head start. There's a great video that's linked with an interview that talks about every hour of delay in diagnosing sepsis increases the odds or the risk of death by four to 7%. So hours really is the difference between life and death here. And this is just a fantastic example of how with the right kind of infrastructure supporting data science, how it can be rolled out on a mass scale and really, really help humanity. Okay, third use case here. Let's look at automotive manufacturing. So BMW Group, everybody's I'm sure heard of and seen BMWs on the road. Eighth largest automaker in the world been around for over a hundred years. They roll out two and a half million cars a year. Now, BMW has always prided itself on its image of innovation. They made their first electric car in 1972. So as an innovative, technology-minded, technology-oriented company, it's clear that this is a company that's going to want to make the best use of emerging data capabilities, data science, machine learning. So today on the road, there are 1.4 billion cars. 250 million of them are connected. Every major car company in the world is working on autonomous driving. Basically, auto manufacturers are becoming internet of things manufacturers. It's not just about commuting or transportation or getting from A to B. It is about an experience, a connected experience. Now, BMW wanted to manage, so they have a program called Connected Drive. So if you happen to be an owner of a late model, BMW, you're probably familiar with their connected drive application, allows you to do everything from navigation to scheduling maintenance, to ordering a pizza while you're on the road. There are a billion connected drive requests per week. And of course, BMW is wanting to constantly roll out new services, new capabilities to stay ahead of the competition, to provide an awesome experience for their drivers and so on. So they have put an open shift-based infrastructure in place to enable their application builders, their data scientists to develop, again, these new services and be constantly iterating in that sort of rapid innovation, rapid trial and error type approach to rolling out new services. And they also realized that they had to adopt a more DevOps type of culture, right? That in transforming from that sort of pre-connected world to the assumption, the expectation that every car out there is going to be connected, their whole software development organization had to be itself transformed. So again, solution based on open shift, they developed the D3 data-driven development platform to throw out the massive amounts of data being generated already by their cars out there. This was all built using the latest cloud-native architecture of microservices, et cetera, on top of open shift. And with a partnered development company called DXC, they have really taken this whole initiative over the last five years to an amazing new level. Here's a quote from the chief architect of that partner DXC. This just, with the right level of analysis and efficiency would take literally millions of years of effort. When you start to think about what gets short-circuited by throwing machine learning at a problem versus having humans figure it out, it's pretty amazing. Okay, so we've walked through three different example use cases of how a hybrid cloud infrastructure has helped companies or organizations in those three different industry verticals significantly improve or augment their offerings, their customer experiences, et cetera. So all of that has been, what I've gone through already has all been developed on open shift. Again, that Kubernetes-based cloud platform. So I wanna talk about real quickly here in my last couple of minutes is a project called Open Data Hub. So this is an open source project. If you go to opendatahub.io, you'll see what it's all about. And in short, I mean, as said here, a data and AI platform for the hybrid cloud, it is built on top of OpenShift. Basically it is an OpenShift operator. Operator is a special construct that is what installs and sort of monitors the runtime of workloads on OpenShift. So it is an operator, it's a meta operator, if you will, that pulls together different open source projects that are part of the data science workflow, enabling a much easier, let's say wiring together and setting up a data science environment to allow companies to do the kinds of projects that we just went through, but much more easily, right? It gives the folks setting up environments for data scientists just a leg up, takes a lot of that sort of configuration and installation, both time and risk of error off the table. So here in a super, super simplified nutshell is what that workflow looks like and some of the technologies that have been incorporated, right? So you've got data in, let's say, an object type of store such as SAP or S3, you've got data scientists working in Jupyter notebooks, perhaps using Spark, TensorFlow, they'll run experiments. The Kubeflow technology marries that with the underlying Kubernetes in an efficient way so that jobs can be done in that containerized Kubernetes environment. The workflow to deploy models as a service on OpenShift is part of this, either in a simple way or a more advanced way with technologies such as Selden, also incorporated our technologies for gathering metrics and storing the results of those metrics so that's your Grafana and Prometheus technologies like that. So this is what Open Data Hub is doing. It's bringing together these open source projects into a coherent, relatively seamless environment to empower data scientists, data engineers, machine learning engineers, all of the stakeholders involved in creating these intelligent applications to give them the environment that they need so that they can do this rapidly with high performance at scale without spending too much time having to do all of the tedious and error prone wiring together themselves. So let me end with the following, a couple of takeaways that I hope you've sensed and as we touched on them in some of these case studies as well as some of the initial setup. The data opportunity will force practically everyone to be hybrid, right? The notion of a walled garden is a fleeting fantasy, right? If anybody is imagining like, oh, I'm just going to go buy a data science environment off the shelf and set it up in a way I go. Things are moving so quickly and enterprises, organizations out there already have so many different technologies in their data centers that any kind of monolithic approach to data science is just doomed to disappointment. So hybrid's fundamental here. With that in mind, anybody setting up infrastructure or data scientists out there who are specifying the requirements of your infrastructure providers for such environments, you want to ask for the power of flexibility and adaptability. You want the ability to pull in new technologies to connect things in different ways. So whereas in Open Data Hub, as I just discussed, some of that is taken off the table for you, some of that wiring together. It shouldn't be walled off. It shouldn't be hidden. It shouldn't be completely black boxed. You need flexibility. And related to that, there should be a balance of opinionated constraints and freedom, right? So basically, you know, there's that phrase, you know, make the simple things simple and make the hard things possible. That's what you really want to get to here, right? So there is no perfectly, you know, handheld, you know, can't hurt yourself type of environment, but with the right kind of opinionation, the right kind of, you know, guard rails, you can be made much more efficient and be able to roll out data models and machine learning enhanced applications that are scalable and reliable. So I talked about cloud containers, microservices, that stuff's here for a while. So you can very confidently bet on a technology like OpenShift, you know, the sheer growth and the rate at which that is being deployed out there, you know, that's not a, hmm, is this going somewhere or not? It is a very dependable foundation. And, you know, when you have the right platform, the right foundation in place, that's going to accelerate your efforts. And then last but not least, as you saw in a couple of the case studies I went through, in order to be successful with these projects, these organizations underwent not just technology transformations, but cultural transformations. They had to change the practices, the behaviors, literally the organizations of their people to make best use of these new technologies and new ways of doing things.