 Open Data Hub This video will describe how the Open Data Hub project started internally in Red Hat Dive deeper into what Open Data Hub is in the ecosystem around it Finally, where Open Data Hub is used today and how to contribute and get engaged in the Open Data Hub open source community project How Open Data Hub started The Open Data Hub project started internally when Red Hat as a data store for our own data engineers and data scientists Hence the name Open Data Hub. Early on we realized that data scientists and data engineers requirements for tools and AIML components Are different than Evobs requirements. They are mostly UI driven Avoid using terminal commands and expect the tools to be included with all their favorite AIML libraries They are accustomed to frequently using Collaboration and sharing is also an important requirement for their workflows to successfully deliver to production. As our data scientists Started using Open Data Hub internally and we started building the Open Data Hub project based on their requests and challenges They faced. We realized some of these challenges are faced by many of our customers with their own AIML teams For AIML projects There's always a team of data scientists, data engineers, DevOps, product owners, and business developers that need to collaborate and work together Sharing and collaborating around AIML development is difficult. It's mostly manual and is error prone Another important challenge is computer resources AIML workloads are compute heavy and CPU memory and storage are not unlimited resources on clusters in any development or production environment. A final challenge one that is very critical is delivering to production and the production development life cycle For most companies that are just starting with adding AIML features to their products They start with small teams tasked with investigating AIML tools and platforms to use The easiest path would be to use the well-known large proprietary cloud platforms These platforms have most of the tools needed and have simplified the point of entry for users This works for the initial AIML prototyping phase However, proves to be very expensive when moving to the AIML production phase Users also find their work and process locked into the specific cloud They have chosen one of the guiding principles of Open Data Hub is to give users more flexibility by using open source tools and Allowing users to install Open Data Hub in a hybrid cloud What is Open Data Hub? The Open Data Hub project is a meta project to integrate open source tools and provide an end-to-end AIML platform on OpenShift A meta project integrates multiple open source projects into one project that is easily deployed by users The AIML workflow starts with prepping and ETL-ing the data into a data lake or storage and Making it accessible for data scientists The next phase is the model development which includes feature selection Model creation training and validation. The last phase is moving and serving the model in production This phase is not a static one-stop model serving phase, but is a constant optimization phase The cycle of monitoring Optimizing and serving is a constant cycle that happens for the lifetime of the model and is a collaboration between DevOps data scientists data engineers and business developers For more information and Open Data Hub, please visit OpenDataHub.io To install Open Data Hub today, you will find it in your OpenShift cluster listed under operator hub community operators To help enable our customers we built an ecosystem around Open Data Hub that provides our customers with a faster go-to market strategy This ecosystem provides tools for tight integration with Red Hat products such as Red Hat OpenShift Red Hat Chef Storage, Red Hat Decision Manager, Red Hat OpenShift Service Mesh and Red Hat 3Scale To showcase these integrations, we built multiple industry use cases showcasing Open Data Hub integrated with Red Hat products Such as fraud detection with Open Data Hub and Red Hat Decision Manager We also work with third-party vendors to get them certified to use UBI images and certified operators These partners become certified partners that provide support for their tools integrated with Open Data Hub such as Seldon for model serving, NVIDIA for GPU and Cognitive Scale for Trusted AI As part of the ecosystem, we also have a team dedicated for AIML Consulting services to help our customers succeed in their digital transformation plans and accelerate development and time to market Open Data Hub has an upstream and downstream relationship with many open source projects The main downstream project to Open Data Hub is Kubeflow However, Open Data Hub also downstreams from other open source projects such as Seldon, Kafka, Spark, Grafana and Prometheus to provide a comprehensive end-to-end AI ML platform and OpenShift In many cases, enhancements and changes made in the Open Data Hub project are also upstreamed back to the original open source communities such as the many changes upstreamed to Kubeflow for OpenShift platform support As part of the Open Data Hub project, we see potential and value in the Kubeflow project So we dedicated our efforts to enable Kubeflow on Red Hat OpenShift and integrate Open Data Hub with Kubeflow in one operator installation It is now integrated into Open Data Hub and runs on OpenShift Kubeflow brings multiple new AI ML capabilities and features For model training, we have TensorFlow and PyTorch For model serving, we have Seldon and KF serving and for pipelines, we have Kubeflow pipelines based on Argo As part of our goal, we upstream all enhancements made back to Kubeflow project We also worked with the Kubeflow community to add OpenShift as one of the supported platforms for Kubeflow as shown in the menu on the right and we are releasing Open Data Hub 6.x operator with Kubeflow integrated Open Data Hub 0.6.x will include components from both previous Open Data Hub operator and Kubeflow For the data analyst user, Open Data Hub will include integration of data lakes such as S3 interface to Red Hat self-storage SQL databases such as Postgres SQL and MySQL and data streaming using Kafka Strimsy For data exploration, Open Data Hub will include SuperSet and Hume For data processing, Open Data Hub will provide Spark and Spark SQL Thrift Server Metadata tools such as Hive Metastore and Kubeflow Metadata will be included For the data scientist access to the data is through the storage interfaces we just described Chippurahub is provided for a development environment integrated with OpenShift Authentication and GPU resources Multiple tools for model training and verification are provided such as TensorFlow Jobs, PyTorch Jobs and Spark Open Data Hub will also provide pre-trained models as part of the AI library tool for creating pipelines data scientists can use Argo and or Kubeflow pipelines For DevOps engineers, monitoring tools such as Prometheus and Grafana are provided for monitoring all components in the AI ML end-to-end platform Model serving tools such as TensorFlow Serving and Selden are also provided Pipeline tools are also provided as mentioned before Where is Open Data Hub used? As mentioned earlier, Open Data Hub is used by many internal Red Hat teams to run AI ML workflows Some examples are anomaly detection on runtime logs, AI ML development on operational metrics from OpenShift clusters Storage for customer insights data such as SOS reports and customer feedback We also have Open Data Hub installed in the Massachusetts OpenCloud project and used by many data science students for their research work Led by Boston University, the Massachusetts OpenCloud project is a collaboration effort among Boston University, Harvard, UMass Mhurst, MIT and Northeastern University as well as the Massachusetts Green High Performance Computing Center and Oak Ridge National Laboratory To conclude our introduction to Open Data Hub, we would like to reiterate that Open Data Hub is an open-source project Support is only provided by Red Hat for Red Hat products and ISV partners for their own certified products Customers are responsible for the upstream open-source software If you are interested in collaborating and contributing to the Open Data Hub project, please visit OpenDataHub.io Thank you very much for listening to our Open Data Hub introduction video