 Welcome everybody to another OpenShift comments and today we're really excited for the IBM CloudPak for data team. So this new release has been eagerly anticipated for all of IBM's customers and CloudPak for data users. And we're here with Clarinda Mascareñas, offering manager of IBM CloudPak for data, as well as Clay Davis from Tech Data, very important partner. We love Tech Data and then Travis and Partha are also here from the IBM CloudPak for data team. Please take it away. We'd love to hear more. Thank you so much, Karina. It's really a pleasure. Definitely. It's been a great release for us this year. And I will give you guys a quick overview of what we will be covering in our agenda today. So in today's session, we will showcase the highlights of CloudPak for data version 3.5 release with a quick demo of the deployment using our operators, which is one of our new capabilities and how it ties into Red Hat Marketplace. And we've also onboarded this release in 3.5 on our global distributor's Tech Data Marketplace. And we hear from Clay on why CloudPak for data is important to them, followed by a quick end-to-end demo that Travis will walk us through. Now, we've come a long way since 2.5 years. We're in our ninth release, version 3.5. And in today's presentation, we will be learning more about the enhancements in the release. We just had a successful G and like Karina said, prior to the Thanksgiving week on November 20th. You know, I just wanted to give some background, you know, what we exactly did like a couple of years ago through our data and EI portfolio with data management, governance, analytics. You know, we tried to build the best tools, point solutions for the different use cases, but clients wanted to build a more comprehensive use case driven platform that had to go through the pain of piecing these services together. And so since 2 years, our positioning is more from a platform perspective with CloudPak for data. And many of you must have had about CloudPaks itself, which are pre-defined use cases. We have 6 other CloudPaks. It's to deliver our end-to-end experience with a pre-integrated unified experience to end users. I wanted to quickly give you guys also a feel for what our data and AI platform is. As we start from our foundation, which is based off OpenShift, CloudPak for data is truly a hybrid offering, which can run on any public cloud, on premises, avoiding vendor lock-in. And as you can see in the three boxes here that we have, we have data management services. There is always a need to use data from diverse sources, allowing you to manage your enterprise data through a single plane of glass no matter where it lives, through data virtualization. Our main differentiating factor, which is data governance in the center through the organized realm of the ladder. Well, I could say, you know, it's important to understand with data that is actually required for AI. And it needs to be trusted so that you can then analyze it to build certain service analytics. And the last section, the last box is analyzed with our data science and analytics support for best-in-class tools and open source frameworks that allow you to run your models across a variety of different environments. Think of it like build once and deploy anywhere. And of course we have these different personas that you can actually see on the platform on the top. Now quickly for version 3.5, I just wanted to cover some of the foundational specifications. 3.5 supports OpenShift 311 and 4.5. And besides our different deployment options that I just called out, we are also introducing our support for Zee, this release. And also we run on storages, including the OpenShift container storage, portworks and NFS. And we've seen a bit, you know, without growing ecosystem, also on voting on the tech data marketplace, etc. How cloud-packed for data is growing not just with IBM third-party services, but also open-source services. Now the next thing I quickly wanted to cover is if you need an overview of the latest packaging and where the capabilities lie in 3.5, version 3.5. We have some base capabilities like you can see over here, and then we also have extensions. I give a simple analogy similar to your iPhone. We have default apps, which are part of your base services. And you always have premium services, which are like extensions. And all these services are pick and choose, pre-integrated. It's a land and expand model based on your needs. This release, we are introducing new services in the base that you can see highlighted with data management console. We'll see details of that in a bit. In the AI portfolio, we have the WMLA, the Watson Machine Learning Accelerator for deep learning use cases, as well as data privacy enhancements. And then from an extensions perspective, we are introducing knowledge accelerators for different industries for business vocabulary, and then open pages, which is actually one of our GRC solutions, and also an oil and gas solution that we're introducing this release. Now quickly, just to summarize, you know, what are the high-level themes in our faculty of this release, given the times we are in? We are seeing a trend of companies, they're even in a survival mode, you know, with a new normal or in there in an accelerated growth mode. And having said that, our two high-level themes to cater to both these types of needs are the cost reduction strategy and the innovation strategy. And you can see from a cost reduction perspective, and we will cover the details of each of these themes and areas in a bit. Businesses are looking to optimize their costs, primarily through automation or IS or moving to cloud to optimize their infrastructure. And they're also looking, you know, for return on investment, that's a very important factor. Additionally, when it comes to innovation, they're more in a growth mode than trying to keep up with the increased demand for their business, investing more in resiliency or risk management and data security or advanced TI. And we'll be seeing what each of these capabilities are actually going to cover in a bit. So from a cost reduction perspective, there are two main areas that I wanted to highlight here. One is improving user experience for more productivity and then our simplified platform management and enhanced automation to increase time to value and efficiency. So we'll be focusing first on user experience for more productivity. The first important thing I want to call out here is you can see on the left-hand side, you have many different pain points when you use a platform. And you have data located on many different servers, public clouds, many different user interfaces for different users. And it's painful for end users to get their job done, you know, very seamlessly. And so you can see on the right-hand side here is our unified user experience based on the job role and permissions. So it's simplified from a persona perspective and its experience around our users rather than the services that we have on the platform. And our design team has done a ton of user research studies and defined how the navigation will appear to make it more intuitive as well as provide ease of use for our end users. The next capability that I wanted to cover is in terms of our unified experience is for data engineers. We wanted to give them a unified way to manage the databases in one place. And without this tool, you know, it's called the data management console. You might need multiple consoles to manage native databases running on the platform. So with this unified data management tool, you can use it to manage data virtualization connecting to any sources that are on public clouds, on premises, etc. Your DB2 databases on the platform, you know, to run your queries, to monitor the performance. And this new console is actually built on a full set of open RESTful APIs. So anything you can do on the interface, you can also do that to our open APIs. So from in short, in all from receiving alerts and monitoring hundreds of databases and optimizing the performance of them from one screen, providing you a single view across the enterprise to even creating, altering and managing your database objects to the single interface. So this is a great value add for us on our platform. The next important capability we have is platform connections. Again, there are two main goals here. We wanted to make sure that we use a common mechanism of connectivity across all our services on the platform and a common set of connectors across those services. And if you want to find a set of these connectors, they are available on our Knowledge Center. Please take a look. It includes IBM, third party, all different types of connectors, as well as custom JDBC connections that you can define. The goal is primarily you can define once and make it available in a catalog where you can use it from anywhere. And the main problems this is trying to solve is primarily around reusability and streamlining the use of data sources across our platform. Now the next team, we covered some of the highlights in from a user experience standpoint to make to increase our productivity. The next team is around our unified platform management capabilities and enhanced automation. So, you know, we've seen in the past system administrators and end users often have a lot of difficulty in operationalizing and managing their data and AI workloads. So this has been one of the pain points. And what they've done this release is we've introduced a couple of capabilities. One is through our platform management, you know, system administrators on continuous platforms. They have many services deployed and different resource consumptions and entitlements. And they're very complex to manage on your own. So besides providing the capability to drill down from service to power level to debug and correlate the issues, administrators also require visibility and control, you know, of compute memory resources being consumed by users, services and the platform. And the visibility and control of the workloads across the platform, including all the services that are deployed. So what we've introduced this release is we are also giving the capability to configure resource quotas on CPU and memory for the entire platform as well as individual services. That way you can monitor your thresholds and receive email alerts when usage exceeds the configured quotas. And optionally, you can also configure a scheduling service to enable a soft enforcement of these quotas. That way, you know, you want exceeding what you've actually allocated. So this is one of the great capabilities this release. The other important capability from a management perspective is oftentimes we've seen that a lot of the data science workloads, etc. That are running in production. We need to make it easy to monitor it as well as manage it over a period of time. So we've introduced this capability in deployment spaces with enhanced dashboarding capabilities where you can actually see an integrated operations view for the workload that you're running to depict the runs, the failures, etc. As well as, you know, so that you can quickly find your issues and get a quick view across all the different spaces. When we say spaces, think of it as just a concept where we actually do our production level deployments on the platform so that you can access it through your apps, you know, your machine learning models, through a REST API, etc. And this also builds the way for us to, you know, to build on queuing and capacity planning for these production workloads in phase two. Now, the next important capability and I won't speak much to it because Partha is going to walk us through this demo is our Cloud Platform Data Operator. It's an OLM based operator for faster deployment and configuration, allowing you to install, uninstall, patch, and scale in an effective as well as an automated scalable way. So let's see it in action. Over to you, Partha. This is the first time Cloud Platform Data has adopted the Operator Framework for installation and upgrades, which makes it easier for customers to adopt the platform and get started in a quick way and makes installs and upgrades easier. Historically, we have been using a tooling tool based installation and this is the first release where we have adopted the Operator Framework. So in this demo, we have the Red Hat Marketplace way of installing the cluster. So here I have registered the OpenShift cluster in this Red Hat Marketplace console. So let me just show you how the experience is. So when I click on the cluster console, it will take me to the OpenShift cluster. When that opens up, we can go to the software that I have installed already on my Red Hat Marketplace dashboard. So you see all the listings as usual and one of which is the IBM Cloud Platform Data. So you can install the operator from this console directly. So what this does is it gives you a mechanism to install the operator, pulling it from the IBM operator catalog dynamically. So here I just click on the install operator and what happens is it takes me to a page where I can select the OpenShift project they want to install it in using the OLM mechanism. So here I select the OpenShift project called the CloudPack for CloudPack demo. And the installation is started immediately and in a couple of minutes the operator is installed and is ready for use. So this is my project where I'm installing the operator here. You can see that the CloudPack for Data operator is getting installed. So as soon as it is installed, it is ready for use. So I'll show you quickly how we can install the control plane directly from this console. So I click on the CloudPack for Data record and in the details I can see all the important services that we have been talking about in this session. All the main services that are highlighted here for the customer. It also links out various storage and resource requirements to the IBM Knowledge Center where user can look at what are the resources required and what is the security constraints that the platform uses. So I'll quickly go and create the control plane wherein I need to specify the service name that I'm interested in, namely the control plane in technical terms is called light. I specify the storage class and then I just accept the license terms and conditions. So what this does is it installs the control plane which basically sets up the CloudPack for Data web client and from where end users can get started on it easily. So in the same cluster I have another project where I have installed a couple of other CloudPack for Data services. So here you can see we have installed all the important services that we have listed namely AI OpenScale, Watson Machine Learning Service, DB2 Warehouse and WKC. That's all I have to share. Thanks Faranda. And any questions feel free to reach out to me. Thank you so much, Partha. And I request everyone if you want to try out this upgrade of a Boeing Live on the Red Hat Marketplace on December 10. So you can try it out. We have a trial as well. Maybe Travis, why don't you quickly show us a quick demo of the end to end back home. So Travis, do you mind sharing this screen? Hey, good afternoon everyone. My name is Travis Chinaret. I'm a senior architect with IBM focusing around our data and AI portfolio. And today I'm going to walk through a quick 15-minute demo for you around CloudPack for Data. So I'll start off with a couple of slides. They're kind of setting up the stage for the demo. So let's talk through what you need in the data and AI platform. From an IBM standpoint, we have a very prescriptive approach. We break it down into these four overall domains around collect, organize, analyze and infuse. And you can kind of read through the details. But if you start with the collect side, it's about how you access data, where the data is, bringing the data forward, pushing workload down to the data. It's how do you make data access simple and repeatable? From an organized standpoint, think about that as data ops, right? So the ability to discover data, understand your data quality, capture and publish that information out to an asset repository for reuse. With the goal being how can you set up shopping for data for your data scientist, your data analysts and other folks. On the analyze side, it's all around providing the right tools to the right people at the right time. This may be where everyone wants to start, but without those first pieces that are uncollected and organized, your analyzing just isn't quite as valuable. But if you look at it, you also want to make sure that you can now democratize that ability, that whether it's a coder or someone likes to drag, someone likes to click, that you can access the right tools for the right skill level so they can get their work done. And then a big piece with that as well is also the ability to collaborate and have reuse. And a piece that I love to talk about is around infuse. And the biggest part about that is that a lot of organizations will be able to get the data, they'll be able to get some good skill data scientists or others that can then get some insight. And then they fall down with how quickly or how not quickly it takes them to actually infuse those, that pieces of insight, that pieces of knowledge back into the business to get value. And so what is a platform that does all that? That's the purpose of CloudPak for data and its ability to be the deployment platform for multiple analytical and AI-based microservices that fulfill that requirement. And the great part about it is it's definitely part of IBM's hybrid cloud strategy. So it fits across whether it's an IBM cloud, AWS, Azure, Google cloud, deploy to the edge, install within your own private network, or you have a pre-built system that can house that for you. So let's take one quick deep look under the covers of CloudPak for data. So you can kind of see where this is before we go into a demo. So at its base, there's a control plane layer that's built upon Red Hat OpenShift, that's now part of IBM. As part of that, there's a small CloudPak for data-specific control plane on top of that that is a common framework around backup and restore authentication, workload management, etc. And then the magic on top happens first in the base area around CloudPak for data. So within those same four domains, collect, organize, analyze, and infuse, there's various microservices where each microservice can be deployed independently. You can have just one of those running within your environment or have all of them or any combination thereof, right? So under collect is things such as a streaming engine. Data virtualization is very popular. Data warehouse, put a spark engine in place. Then under organize, it's one of the industry-leading platforms for data governance. They're on a Watson Knowledge Catalog solution. Under analyze, it could be as simple as making an embedded dashboard, really quick and easy dashboard visualizations. Or you may want to jump into the Watson Studio Tools, where you can have and use our AutoAI functions or jump into a Jupyter Notebook, into a data refinery, data wrangling job, for example. Hit on the right, OpenScale, which is to monitor models that you've deployed and watch the machine learning. It ends up being the runtime environment to deploy models and to do that work. On top of that, we have a whole set of extensions. So depending on your project and your project needs, we can add third-party tools such as Postgres. We can do DB2 Advanced running on the platform. You also have a lot of other pieces around master data management, virtual data pipeline, ETL, data stage components, et cetera. And then there's Cognos Analytics, Planning Analytics, including our Watson Studio Premium pieces, which adds an SPSS visual modeler onto the palette for data scientists, as well as Decision Optimization Engine, known as C-Plex and HANA Past Life. And then obviously our natural language processing and other capabilities such as Watson Assistance, natural language processing, speech-to-text, text-to-speech, Watson Discovery, Watson Financial Crimes Insights is another popular piece that goes on top. So under the covers, those are all various microservices that are available and accessible through Cloud Pak for data. Now let's go into a demo where we can see some of those pieces right there in action. So let me just set up my demo scenario. Fictitious Telecommunications Company, we're looking at a marketing campaign. Right now we have a new phone release coming up pretty soon, but we also have competitors that are poaching over customers, right? So our goal is to get a better understanding, better working and quicker to deploy, propensity to churn model. In this scenario, I'm going to do this all in the next 15 minutes for the end-to-end demonstration. And so here's what you're going to see. This is part of the demonstration today, right? So take a look at that same Cloud Pak for data. The first phase we're going to take a look at is what would be performed by a data engineer or a data steward. So we're first going to use the data virtualization technology to show how it can connect to multiple data sources. Then we're also going to show the results of doing a discovery and data profiling on those different data sources and you can see then how they would be published for use within the data catalog. The second swim lane we're going to go through is kind of take on the role of a data scientist or a business analyst. We're going to shop for data. We're going to use AutoAI, which is a new function within Cloud Pak for data in the last couple of releases to build a predictive model. Then we're going to quickly promote that model out to a deployment space, which is a unique production-ready place for deployment of models. And then we're going to go ahead and show how we can then take that model and actually deploy it as an online or batch service and then show how it would be infused into applications. All right, so let's take a look and get started. All right, so let me get into a web browser. All right, so here is my Cloud Pak for data instance. Like I said before, let's first do some talking around the collect piece, right? Just to navigate the screen, I'm logged in as an administrator, so I do have access to everything. So I'll play all the roles of my team today, including the data engineer and data scientist and person who's going to do the deployment of the model. And we'll start off with the fact of the first screen will show me a various set of tiles and interactions that can be modified and customized per user basis. And so for here, I can see a bunch of different activities I have going on within my environment. I'm going to go off into data virtualization. I'm going to take a peek there first. So I went ahead and did some pre-works. I have 15 minutes for this demonstration. And you can see right here, here are a whole set of various databases and different kind of data repositories that I already have a set of some pre-built connections. Excuse me. As part of this, for example, Mongo, MySQL, Oracle, DB2, Postgres, MariahDB are all different pre-built connections that I have went ahead and configured. And now I have my own data virtualization central node that is able to reach out and connect to each of these in a constellation kind of view, where I can now set up and expose a view of this data to users of this platform or users of external platforms that want to do things. So I could go in and let's take a quick peek at some of my own virtualized data that I have established. So within those, let's take a look just at a few of the tables. So here's a few of the tables that are out there right now. As part of my customer churn demo that I'm going to build, I'm going to need access and information around customer satisfaction data, customer billing, customer profile, separate tables that could be tables across multiple different database platforms across multiple places within the organization or in the cloud. So if I wanted to show how quick it is for data engineer to take say two different databases, two different tables and join them together and expose them as one single view out to an end user so they don't have to do that work. I can simply come in, notice that these are the 2ID fields. I can grab and drag and drop those across each other. It has the key fields. If I am an SQL expert, I can dive into SQL code and actually build out my own piece in here by hand. I'm just going to use the editor that already has those pieces there. Hit Next. I can change column names if I so desire. Nope, I'm going to hit Next. And then now I have an option to go ahead and take this new view and publish it out either as part of an individual project within the cloud path for data environment. I can fulfill a data request or I can just save it off into my own virtualized data, which is what I'm going to do. I'm going to call this just a demo customer join view. And I can hit Create and go out and take a look at that. So what did that do? Well, that went out now and created this new view that I have right here. It's part of my demonstration. If I go look at that view, there's multiple. I can set up who can access it. I can submit it to a centralized catalog for multiple uses. Let's go take a look actually just a preview of that data, right? So I have authority via my ID and password to actually view this data. You can see there's now 16 columns of data. That's a combination of profile data and billing data. So things such as mental status, number of children, estimated income. Are you a car owner? Just some basic information associated with some subscribers. I think we look at table structure. I guess that's their 16 columns metadata. You can see this comes from two different table sources, 16 columns total. And it's making a custom SQL view into all that data. That's perfect and good. And now what I can do is actually I can take that particular view and I can now either assign that directly to someone's individual project or I can just submit it to the catalog and that would be part of an asset repository that all users could see and use. Just for a quicker demo, I've already put those pieces out there, so I'm not going to go into those right now. But one last piece that I will talk about around data virtualization that's very, very powerful is cache management, right? So I can actually come in and see what types of queries have been running against my data virtualization over the last, say, seven days, the last 24 hours, right? And I can see those pieces happen the last 60 days. I can see, hey, you know what? There's quite a few queries. There's 35 that's not using caching. It's actually taking between one and 10 seconds. So I can actually go in and understand what those queries are and create a new active cache for those particular queries or for those particular tables and then I can control my storage and everything else about it, right? So me as a data engineer, I can make it so that the platform handles the queries and takes pressure off of some of my backend systems. All right, so what would I do next with this data? Right? Next, usually I would go through and then discover this data. Maybe I want a profile and look at data quality associated with this data. I went ahead and kicked off some data quality jobs and already ran those through the system. Here's some of the results of the data quality jobs that I ran. One for the customer SAT table, one for the customer profile table, and one for the customer billing table. As you can see, it shows here's the data quality, which is some high data quality. This has one note with it. Six different terms that it assigned to it. So what does that mean? Let's go take a look in here and see. If I take a look at the columns, here shows the six columns associated with that data. And by using AIs and machine learning capabilities, it went through and said, hey, we have a bunch of dictionary terms. And according to this, according to the title and or the data itself, I'm going to make the assumption and assume via the the models that dropped calls is equal to a dropped call term that we have that's out there. So that's part of the analysis that it did was to match terms to columns, but also went through each individual column and gave a quality score. So there's hundreds of pre-built quality metrics, which you can use as in, or you can make copies of and customized to your heart's content about how you set up your baseline for data quality. For example, complaints per month. I can actually click on it and dive into a little bit more. I can take a look at that data quality. I can take a look at the frequency distribution of that data. I'm going to show that in the graphical form, right? So it goes through and it does the analysis and pieces with us. And then it gives me the ability then at the end, where I can actually go ahead and publish these data results back out to my data catalog from a data science system and teams to use. All right, so let's kind of continue on here. So now I'm going to go back and change the role. So I was a data engineer and I created some data connections via data virtualization. I did some discovery of data, profiling of data and published that out to my enterprise catalog. Now I'm going to come back in as that data scientist, right? For my project of making some customer churn models, I first want to go find some data to go out and use. Data catalogs. I'm going to look into my customer data catalog. Hey, guess what? You know, Amy and Joe, Joe is my data steward for my data ops team. Amy is my data engineer behind the scenes. She went ahead and took those same pieces of data that we were looking at before and it's published them out to the catalog, right? And so what does that mean to publish them to the catalog? Well, you take the metadata and information associated with that data, publish it to an asset repository where it can match up a data dictionary, assets and different pieces together and give end users the ability then to have a nice simple web UI to search for that data and then use that data directly within a project. So for example, I can see that I can go into what Watson recommends based upon my profile and I normally do. I can also go into highly rated and see which ones have some writing to it. Let's go take a look at this customer profile data that's right here. Given authority, it's first going to show me a quick view of that data itself. You can see details of it. I want to go in and take a look at the review that was done. So Susie, who's a member of my data science team, put a little comment in here a couple of weeks ago saying how this is the data set that she uses around customer history, which would be good for my predictive churn model that I want to create. So this is the profile of that data. So as a data scientist without having to dive into code, I can see the distribution of this data and if it makes sense for me to want to use this data quickly. So for example, I can see that Myrtle status is pretty evenly distributed across a couple of options that are in here. Estimated income has a decent distribution with a min, max, and a mean that's in there. And then other things that are in here as well, such as age, month as a customer, membership and date, et cetera. I can also see the lineage of that data, which is going to show me some interesting things, such as here's what was first published to the catalog. Here's when the first data profile was created. And then, oh, by the way, there's been multiple times with this asset that's been used in other projects. So I can see that. I can even contact the people to go in and see about what information they have from the past and their experience using this data format. All right, so I'm shopping for data. You know what? This data is good to go. I want some individual data sets and I also want this joined data set. So Amy created a single view joined data set across customer profile, billing and SAT for the project. You can take that as well. And it's as simple as going to add to my project. I can pull up a list, such as churn, and I can go ahead and add that into my project. I already added them earlier just to speed the demo up. So I'm not going to show that now, but that's the quick and easy way to take data and assets and quickly add them to your project and think about the amount of time that that saves you and the ability to shop for data. All right, so data scientist, I have the data that I want. I've added it to my project. Let's go take a look at that project. All right, I'm going to go into churn. So what is a project? So a project is a scoped space that's on the server that is specific to whoever created it and then whoever they have added as additional collaborators within your project. For here, Susie, Colorenda and Amy are all some of the collaborators that are associated with this particular project. But a project is a collection of assets that only I can see and is protected. And then any work that I do will keep it within the scope of this project, but I still have the ability to publish results back out to say the original data store or out to the data catalog, right? In this scenario, here is all the data assets. So here's like the customer satisfaction, customer profile, customer billing. Here is that extra data set that Amy had created for me that is a combination single view using data virtualization. I can go in and take a look at that as an example. So from a data scientist, I can come in and take a look at this data. So this is doing a real-time query back out to that database and pulling information back for me. And I can see a profile and then you just send kind of things that I was able to see before. But now within the project scope, I can see what have I done with this data within the project, right? Which published to the catalog. I can add it to a data flow. I can do different things with that data, but the lineage of what the team has done and how they've used it within a project space, which is pretty impressive. All right, so I guess this is a collection of assets. Well, so what kind of assets can I put into my project space? Well, let's take a look. I can go to add to project. I can import new data that's scoped within my project. I can make a new data connection that's scoped to within the project. I can make a new auto AI experiment. I can do a new model flow, which is a graphical view into building models. I can make a new Watson machine learning a detailed model or deploy things out for runtime. I can make some visual dashboards without having to write code. I can create a new notebook. Data refinery is a self-service data wrangling tool. So I want to do that within my project to update some data. There's also a decision optimization piece. So I've already used data refinery. So I went ahead and took a combination of those three tables, customer billing, customer profile, customer satisfaction. And I combined that with a separate CSV of customer churn history that I received. I created a new data set. The new data set you can see here on the left is called merge customer churn. So I want to use that and create a new predictive model quickly before all my time expires. So I'm going to make a new churn demo and I can pick the configuration settings for eight CPUs, et cetera. Let's just go ahead and make this four CPUs to start with and create. So what is AutoAI and what does it do for me? So think about if you're not the WISBAN data sciences type that knows how to code everything you want inside of Python or even doesn't understand modeling that much at all from a data science perspective. What if you could use AI from a point and click perspective and have it build a model for you from scratch? That's exactly what I'm going to do. So I'm going to take a look inside of my project and here is the merge customer churn data that I want to use. I'm going to select that asset and it's going to go ahead and read that data set for me and it's going to suggest here are all the potential columns. Which one would you like to do a prediction upon? So for us, I want to predict churn and since it is a representation of the data as being true and false it suggests what's called a binary classification which is just a type of algorithm or a type of work that just predicts between two distinct categories which is true or false in this scenario. I can leave it just as it is and run my experiment just like that. I'm going to dive in just a little bit deeper just for those that have an interest in what's happening under the covers for AutoAI. As you can see here it's going to go ahead and do a 90-10 split for my data as far as 90% use for training 10% hold out to do for some testing and things afterwards, right? I can see all the columns that are going to be part of the feature set for my model and I'm just going to go ahead and just keep them all for right now. I can do sampling if it's a larger data set I want to use a smaller group set to speed up the results but go into prediction you can tell hey it suggests that this once again is a binary classification which is the right choice that it should make. I could change it and overwrite it to do a multi-class classification or if it was a different type I could have it do a regression algorithm type as well and it has a... One of the things that you want to do is you want to look at well how do I want this to judge what is a success and not a success or the best model that it can find for me? Well I'm going to have it based upon accuracy that's the best choice for a binary classification I could also do these other ones and it actually will show me the results for all of those but I want to do it by accuracy there's a whole set of algorithms I want to test I can also decide well how many of these algorithms doesn't want to put through all the paces I want to go ahead and do four algorithms this is going to generate 16 separate pipelines of work for me. All right so save that hit run experiment so what's that going to do? It's going to go through and do a set of activities let me swap the view into kind of this tree kind of view so it's going to read the data set it's going to take the 90-10 split of that it's going to read through all that training data and it's going to start looking at the pipelines that it's going to need for the data and it's going to do some pre-processing it's going to clean up some of the data it's going to look and see what's categorical and numerical do all that kind of work for you so you don't have to know about it it's going to pick the best four algorithms based upon the data set and the type of inputs then for each of those it's going to run through some things it's going to first just do a straight test using that algorithm and see what the result set is and then it's going to take that result set and then do some hyper parameter optimizations see if it can improve the model then it's going to do some feature engineering and get the results of that and then do one more pass on top of that with some additional hyper parameter optimization and it's going to do that across all four of the algorithms that it goes out there and selects so this could take you know 10-15 minutes to run so I'll let that run in the background and let's go take a look at the same one that I ran earlier so you can see the results of what that looks like so that was still running this one was completed a while back let me open this one up and show you the result set from what it did alright so here's the same model the same result set that the other ones should be able to get as well as you can see that there's four different algorithms right here that it shows actually we classified gradient boost random forest, lgbm and it ran through each of those and the starred one right here is the one that it gave as the number one result set from the work that was done I can also swap the view it's a different view into the result set which includes here's the lgbm classifier here is a model that it did and it shows you the feature transformations and the hyper parameter optimization that it did as part of that so you can actually go through and see the details of all the ones that it worked through so here's the pipeline comparison of those 16 different pipelines that were run through and there's accuracy right under the curve so accuracy is the one that it judged it upon so I can actually kind of let's narrow that down to the first view so here's like the top 5 or 6 algorithms associated with accuracy there was the pipeline 3 pipeline 4 let's take a look at pipeline 15 that's the one that actually shows as being the best result set so instead of doing that let me kind of go back here and let's take a look down below so here's all the 16 and the pieces that were run here's the ranking order of those 16 along with the accuracy that came out of it so pipeline 15 using the lgbm classifier with the first pass to run the hyper parameter optimization plus the feature engineering I can actually open that up let me just dive into it a little bit deeper so you can see it I'm a data scientist and I want to see what was behind the covers I can say hey so there was initial accuracy here's all the measures that were the resultant set with the normal holdout or across validation score I can look at what's called the confusion matrix to see what were false positives and false negatives which this turns out to be extremely accurate model I can take a look at that model information itself which shows that it was an lgbm classifier with 40 different features and over 1100 evaluation instances I can take a look at what were the features that it created so a combination of say estimated income how many months as a customer late payment charges and feature importance so this actually will tell me which features were important as part of this as part of this model that was created so estimated income actually had the biggest overall impact on whether or not that person was going to churn an interesting thought who would have known that before but it does make sense it may put him in a different socio-economical class he may have the funds or the ability to potentially change carriers easier or maybe not so those are the results and I can actually now take and save this off as a model back into my project space so this now would be a standalone model I can now deploy as an online model this is a demo model I'm going to save that off into my space but before we take a look at that let's say that I'm a data scientist but I'm a coder I love jumping into Python I don't know if I'm going to trust this but I think I can always do better which maybe maybe not, this is a really powerful tool but I can also go take and export my AI model out as a notebook so if I take a look and let that generate in a notebook to create the notebook this actually will join will come out and show me an entire notebook written in Python that is exactly what the tool did behind the scenes and I can tweak that I can rerun it, there's all kinds of things I can now do within this notebook to show the same result as was done with the model it's very powerful especially with the ability to see under the covers what model the auto AI features built for you where are we now let me go back to my churn model my churn project overall here's a new auto AI experiment this one's still running here's the new notebook I just created based off of that here is that new model that I deployed out to use later on so my next step I'm going to promote this model I'm going to promote this model up into what's called a deployment space so a deployment space is where you would go through and actually deploy models as an online or batch kind of model and you can do it through the tooling or through an API so you can use Jenkins or other kind of ways to automate the whole MLops process and promote that out to the deployment space and let's go take a look at that new deployment space that is out there so there's two assets one asset is the model I created previously and it already promoted out there and the second one is a model that we just created so let's go through now and I want to go ahead and deploy that model to be an online runtime model let me show you how easy and quick that is to do I can choose whether it wants to be an online or a batch model I want this to be an online model this is my demo churn model that I want to deploy hit create so what is this going to do so this is going to take that model it's going to package it up within its own container within the cloudpack for data platform and then go ahead and deploy that out as a pod or as a container it's going to extend the Kubernetes environment and have it be a new online model and then it's going to return back to me the details about that model and how I can access it and test it so while that's deploying let's just go back to the one that I've already deployed out there so I go into my deployments well actually it's already done and deployed so that was quick and easy so now we have the one we just created is now online as a usable model I'm going to use the one that I created earlier because I already have some sample data ready to test with it alright so the first thing that I see here is that my model is deployed it's online I think that there's one copy out there running I could change this and so let's say that I want to have higher availability and higher throughput so that there's multiple things I want to access model at the same time I can actually create multiple instances or copies of this out of my environment simple and easy to do just by changing that and hit save here is the direct endpoint link as a restful interface out to my model so now I can infuse that into other applications and oh by the way here's some example code snippets on how you would go and access access that model from within your own application here's a curl command here's some sample java code some sample javascript code that you can copy and paste python scala so it gives you some examples of what you can do to quickly infuse that into your existing applications I want to do a quick test let's use the built-in test harness right so I can go through here and type in fill in the different attributes and fields and test out the results to speed that up quick what I want to do is I've already saved off in JSON format some sample data so let me go do that here quick all right so for example in this data it's a male married 130,000 dollar a year estimated income has a car has the unlimited plan you can see that he's had zero complaints in the last month one complaint in the last year so it's your average kind of customer I can just hit the predict that now went out and tested my model right and came back with the prediction and the probability of that right so that comes back with is false which means that very unlikely to churn and it's a 99.9% probability of that not churning I can make some quick tweaks to this what if I came in and said you know what he actually had three complaints and two complaints in the last month a very telltale sign of someone that's unhappy has a decent income married and has the ability to change carriers easy let's see what happens from this very accurate model so with those attributes you can quickly see how this person is likely to churn and he has a 98.8% chance 98.9% chance to actually churn so this concludes my demo but I just wanted everyone to see how quick and easy it is to look through the entire life cycle of collecting data and organizing that data and from a data science perspective the ability to use auto AI to quickly generate a predictive churn model and then how easy it is to use the tooling or use the APIs to then go ahead and promote and deploy that model into a highly available run time to actually get its use out there for the business so thank you again for the demonstration thank you it was really a good overview of the platform itself quickly we will be moving on to one of our other great achievements this release is we've onboarded on the tech data stream one marketplace and I would like just to showcase what we're really doing with our global distributors partners etc so Clay why don't we start off with you telling the audience about your role at tech data and before that with IBM first let me say it's really a pleasure to be here with you and the folks here I've been looking forward to this for some time and be virtually sitting with someone who's really smart and talented like you is a pleasure I'll start with my time at IBM I spent eight years at IBM all within the data with an AI organization working with great people like you and Travis and others I held a number of roles during my time at IBM but my final role is directly working with cloud pack for data as a sales leader in North America my team was responsible for driving sales and impacting, helping impact product direction for this new solution of cloud pack within IBM and then earlier this year I began a new chapter in my career when I moved over to tech data but I didn't stray far from IBM I still work with IBM almost every day and a lot of it is around Red Hat and cloud pack for data and so at tech data we're a global distributor and there I'm responsible for leading our data IoT and AI practice globally so I work with both vendors like IBM and Red Hat as well as our business partners and resellers to kind of optimize the impact that we can have through the channel ecosystem so it's a really interesting space to be as I now have a broader view of the market and how best to help our vendors and our partners we like to have you play and it's been an amazing ride this partnership between cloud pack for data and tech data has definitely been building some buzz so do you want to tell our audience a little bit about how it can change the game for customers? Yeah, I'd love to. As you know through my background cloud pack for data is near and dear to my heart so I really love what IBM is doing with OpenShift through the cloud packs even beyond just cloud pack for data so much so that when I arrived at tech data earlier this year one of my highest priorities if not my number one priority was to ensure that the channel ecosystem knew the power of cloud packs and especially cloud pack for data so we kind of found that in order to effectively absorb this power of cloud pack for data I mean you saw Travis go through that just a very brief demo of the robustness of cloud pack for data but in order to kind of harness that power or absorb that power that the channel ecosystem so our resellers and our partners we're definitely going to need some assistance and so thanks to the power of OpenShifts cloud pack for data can be deployed on any cloud which is a huge thing for our channel and for our clients and so as a distributor we work with so many partners that and we work with all these cloud vendors so we set out to ensure that we can build the most effective way for cloud pack for data to be consumed and so that's what we did together our team at tech data your team at IBM we built a solution of cloud pack for data that we term a click to run solution so really what that means is we just make it really easy for our partners to sell cloud pack for data and therefore get IBM and OpenShift into more end users hands and go help deliver business outcomes awesome, awesome faith that's really cool too yeah and kind of a similar question that you asked me but I'd love to know what to ask you to comment on what our announcement means to IBM and especially to IBM business partners it's really exciting you know tech data has over a thousand local vendor partners as we know operating in more than 100 countries and onboarding cloud pack for data on this global IT marketplace stream one which will help streamline the buying selling and other services automated and offered to the global partners is awesome additionally as you're aware you know just what Travis showed with our hybrid cloud ecosystem strategy customization is very key and tech data is definitely as a value added distributor it meets our customers where they are with solutions that are more innovative yet less costly offering comprehensive services to foster this wider adoption so to provide that expertise and to help both our business partners and customers not only to deploy large-scale solutions from technology providers but you guys are helping them customize their specific priorities not to forget the click to run automation that we developed to deliver this on the stream one marketplace of tech data which is definitely going to be a unique value for our partners so simplifying some of the most I feel time-consuming and complicated parts of deployments and in automating complex processes such as infrastructure, platform software as a service deployments building connections, configurations and integrations is something that I feel is really going to cater to our business partners and to our clients so Clay coming back to why do you think tech data selected cloud back for data amongst the other solutions? Wow great question yeah I mean we kind of have we kind of have our pick honestly I mean we work with so many vendors and even partners that have their own solutions I guess I would kind of narrow down to two reasons first as I mentioned earlier like we work across our cloud vendors and so we wanted to make sure that we had a solution that would not only work with the vendors cloud so in this case IBM but Azure and AWS and others and obviously cloud pack for data allows this open shift and second we know that more clients are looking for that all-in-one solution to drive business outcomes and cloud pack for data accomplishes this by some of the aspects that Travis went through but it's to really simplify this and this is how IBM has effectively marketed this solution it's by allowing users to go and collect data organize that data and then analyze that data all before being able to infuse that into their organization to use it in the most effective way possible so I mean it's kind of a short answer but you know for those two reasons it really made cloud pack for data and no brainer for us to pursue and to go build this market ready solution and put it on our ecosystem platform and kind of get off and running may interestingly and I assume I mean you mentioned it already but I see I know that you're already seeing a lot of value from the integration with a better open shift on screen one already yeah you're right Clarinda I mean I mean we probably can't say it enough but it really it speaks that first reason I gave of right where we can work across cloud vendors you know seamlessly it speaks to the power of open shift and this is such a big deal for our channel ecosystem and so you know we know that we live in a multi cloud world but you know especially when you think about the channel there's still a lot of a lot of you know organizations and resellers that are still working that out right figuring out where do they land where do their customers want to be right in trying to work you know through in a business outcome landscape so you know we know that it's a multi cloud world we know that Kubernetes is the future and being able to effectively expose that to the partner ecosystem I think is really really important so this you know the seamless integration of open shift and kind of what it enabled solution and again what we're exposing our partners and then user to is really as much needed and frankly it's just really exciting so and what's interesting is obviously I gave a little bit of my background but you know I've worked with cloud back for data extensively in the past but I've been out of the every day for the last you know nine to 12 months so you know I'd be really curious to hear you know how it's going recently you covered the 3.5 release already but maybe we'll start with what's your favorite new feature that customers can use especially when we think about this click the run solution that we have. Yeah definitely that's a that's a very good point so let me quickly showcase what would be my favorite capability in cloud back to data so I think innovation is definitely one of those areas that has been very attractive so one of the capabilities we're actually bringing in this release frankly speaking is our Watson machine learning accelerator in the base and it allows I think it allows everybody to use deep learning on GPUs it makes it much more easier for data scientists for this distributed deep learning architecture that simplifies the process of training deep learning models across the cluster for faster time to results as well as powerful model development tools in real time for training visualization as well as runtime monitoring of accuracy and some of the hyper parameter optimizations we just saw in Travis's demo for faster model deployment so I think this is one of the great capabilities that's coming in cloud back to data one of the other capabilities which is in early stages by IBM research team but it's definitely a new cutting edge technology and it's a new concept that I think everybody should try out is our federated machine learning capability which enables multiple organizations to train ML models collaboratively without having to share data and so you can imagine what this really means the driving factor behind this is definitely data privacy, confidentiality, regulations and even the cost to move the data right so it's machine learning without moving your data and you can you might have your data on AWS I'd be a cloud on premises and without moving the data from these locations you can have a centralized data aggregator iterate and build and bring ML to where your data lives so I think these couple of capabilities I would say are definitely highlights for this release Clay from our end and folks should try it out Those are really really neat the federated learning especially we're going to have to dive more into that at some point because that sounds really neat and addresses a lot of the data privacy issues that we definitely see in the market Definitely thank you so much Clay it's been amazing to have you on this webinar and we'll continue to continue our partnership going forward Yeah I look forward to it, Clarenda Thank you So quickly before going back to the operator demo there are a couple more capabilities I wanted to cover that are coming in to have back the data one of them is data privacy and many times you know we have seen the need for a lot of data protection that means you want to sometimes de-identify your data for data science you want business analytics and testing to be able to do the same quality of data that you put into production that you're training your models with and so this is one of the capabilities that's tightly integrated with our Watson knowledge catalog from data subsetting fabrication for end users and most importantly it aligns with our governance strategy and you can even use this you know to provision your data for test data for your models in production with the same level of security and this capability is very useful one of the other capabilities I quickly want to highlight is that of knowledge accelerators you know we in our governance portfolio we have data quality, data consumption more from a self-service perspective and we have data governance and oftentimes it's important to understand the business vocabulary of your technical data and building the business vocabulary is more than creating a word list and it takes time to create a usable business vocabulary with definitions and business context so to quickly get you up and running this at least we're bringing in the IBM chat accelerators it's scaling the business vocabulary quickly out of the box for industries like healthcare insurance, financial services and even energy and utilities thank you everybody and congratulations again on this great new release look for it on the red hat marketplace on December 10th I just wanted to reiterate that because it's very important and being able to try it out we're very excited about that as well and until next time thank you everyone