 Hello, everybody. Welcome. We're going to get started in about two minutes. So thank you for joining in. We'll give everybody a few minutes to log in and get set up and get your microphones and your coffee for the day. We will get started. All right. Well, we're going to start right on time today, hopefully, and hopefully everybody is logging in and realizing they're in the right place. You're always in the right place when you're at OpenShift Commons. Today, I'm really thrilled to be hosting and gathering on data science and want to welcome you all to the Commons and tell you a little bit about what the Commons is. And my name is Diane Mueller. I'm the director of community development and the person behind the screen today, driving and hosting this event along with some of my colleagues who will be also in the chat and helping to answer any of your questions. You can follow us on Twitter at OpenShift Commons. Without the S and we're going to hashtag today of OSPG 2021 because this is our first OpenShift Commons gathering of 2021. And we're really pleased that you're here with us today. So just a word about Red Hat and about how we view the world and all the different pieces and parts of today's talks are about. This is the community side of OpenShift and one of the spaces that we've created for the entire ecosystem of projects and products and communities that are part of the Red Hat OpenShift ecosystem. And really, what today is about is about making connections that'll help drive continuous innovation into all of your projects, all of your efforts at your organizations and it's really all about you guys. And so ask questions in the chat. We'll talk a little bit about today's agenda. What I like to say often and you'll see it on the t-shirts and swag when we're in person is that we have more in common than you know. And today, I hope you all will discover that. Today really is and the whole purpose and goal of Commons is create connections. And today, please expect one of my new favorite words, entanglements as well. So we're going to be doing a lot of talks today that are introducing new initiatives, new community and where they overlap. There will also be some fluidity as I like to say. A lot of these talks have time allotted right afterwards for some live Q&A and we expect some of that maybe to run over or run under. So we're going to try and keep everything on track today. That's my job. And we do want to hope that you all respect the code of conduct and respect each other. And you can use the Q&A tag in the blue jeans to ask questions. And the question that everybody always asks right up top is, and we have the slides, will the recordings be available and when and what are the next steps? So yes, the slides are available and we'll post the GitHub repo where I put them all in the chat shortly. And all of the recordings of all of the talks, except for the one that's going to be live today, Marcel Hill's Beyond AI Ops talk. I will have to edit that and upload it later this afternoon. But they will all be on YouTube. So if you go to youtube.com slash open shift shortly after this, give me 30 minutes or so after this session, today's thing ends. They will be live there and you can watch them and review the slides there. So yes, we tried to take care of that. Next steps, there's lots of next steps. Next steps will be, let me talk a little bit about what OpenShift Commons is and what our schedule is today. It's pretty beefy. And Commons is always about crossing ecosystem lines. Our first speaker will be an academic, which is wonderful. Kate Benko is coming from Boston University in MIT IBM Watson's AI lab. And we thrilled to have her today. We have folks from on the enterprise research side from Ericsson Research. We have the folks from America Mobile are going to be talking about a new initiative. They're kicking off all the enterprise neuro system. I love the neuro system metaphor because it does, again, go along with the entanglement and the overlapping of things across silos. There's another new community or reorganized rebranded community of ML Commons that we're going to have a panel on. We're going to talk a lot about open source today and the open data hub and talk about the toolbox that we're working on here at Red Hat to support data scientists. And then Verizon Media's going to come on and talk about doing some really cool stuff with their project, Leo. And then we have a couple of NVIDIA GPU operator things. And so, you know, we're really trying to give you some of the building blocks that you need to come in to do the workloads that you're looking at with data science, whether it's on the enterprise side, the research side, or just trying to drive open source projects. So, please, at the bottom there is also the link to the agenda, which will be up all day, and we'll try and stick to it. So, OpenShift Commons really truly is, it's all about you guys. It's about different communities, the different projects, all of the stakeholders, whether they're end users or partners and contributors to all of these projects. Really, we do better when we all work together and we'd love to have you join Commons or any of the other organizations that are going to talk today will hopefully put their resource links up, too. So, if you're interested in joining Commons, just go to the Commons website, fill in the join form, and we'll add you in happily. So, really, just a word about what Commons is. It's really, rather than focusing on one open source project like Kubernetes or OpenShift itself or OKD, we really recognize that OpenShift is really an ecosystem-based universe now. We've tried to create a new community model that brings in all of the different communities, the upstream projects, and to really promote peer-to-peer interactions. So, today's job is really to try and help you connect with each other, to make some connections, and hopefully join us and help foster some innovation. And it really is across multiple ecosystems. We play a lot in the CNCF, the Cloud Native Computing Foundation, and every one of the projects there is interlinked and interconnected. And whether it's you're using any Kubernetes distribution or our open source side, which is OKD, or OpenShift itself in its many variations, you are going to be touching, if you're in data science, all of the top projects. And these are just a few of them that are some of the upstream data science initiatives that you'll hear about today. And hopefully, you'll be enticed to take a look at Open Data Hub as well, which is one of the wonderful projects that we've been pushing and creating a reference architecture around and supporting our data scientists, which you'll hear a deep talk about that. And then there's the ML Commons folks who've just rebranded and taken in ML Perth. So, there's lots of different places that we touch these different ecosystems. And Red Hat, this is how Red Hat kind of sees AI. It represents a workload that is definitely a requirement for our platform support across hybrid clouds. It's applicable to Red Hat's existing business. We use these tools that we're going to hear about today inside to increase our open source development and production efficiencies. We know that it's really valuable to our customers for specific services. And we're really trying to create that intelligent platform experience and really helping you all build those intelligent apps using our products as well as the broader partner ecosystem. And underneath that, data is the foundation, all of that. So, you'll hear a lot about all of these things today. But one of the things that I was going to ask everybody who's here today first to kick this off while I kick off the first video is in the chat. How do you see AI? Because I've been looking at, it has been a whole ton of people who've registered for this event. And I've been really amazed at the range of people and the roles that you have, their AI leaders, researchers, analysts, application development managers, big data engineers, data modelers, all kinds of folks right down to container platform engineers and data scientists. So, in the chat, while I queue up the first keynote speaker here, take a minute. And if you don't mind, tell us where you're from. Where in the world are you today? And what role you have in terms of what you're doing in AI? Are you a data scientist? Are you a platform engineer? Are you an upstream person? Are you in compliance or risk management? What are you doing in AI? How do you see AI? So, just take a minute, try and test out the chat there and see if we can rock and roll there. So, the other thing that I wanted to do today, so our first speaker is going to be Kate Sanko. And I'll queue her up in a minute. But I really wanted to thank some of the other folks from Red Hat who've helped me build out today and hear this. You'll see our faces on the screens every once in a while. Sherard Griffin, Chris Short, who's hosting this in doing the live streaming side of this, and Audrey Resnick, who's been joining us in a little bit. And Bill Wright, who's been on the partner side, helping us get some of this all working and going. So, I'm going to stop sharing my screen for a minute, queue up the video, try and keep us all on time, and I'm only one minute behind. So, thank you again for joining us today. And I hope you enjoy the day. I really enjoyed interacting with all of the different speakers. I'll be sharing my screen now and queue up a video. Hi, my name is Kate Sayanka. I'm a professor at Boston University and at the MIT, IBM Watson AI lab. And I'm very happy to be here today to talk about my research on data set bias. So, I'm going to start with talking about the success of AI and computer vision. So, computer vision is AI technology that can analyze visual scenes. And you can see here an example of it applied to detecting cars and buses and pedestrians and images. And it's quite good. And getting better. So, here's an example of computer vision for object detection in a different scene. We can also train computer vision models to classify other objects that are maybe even cartoon characters. And we have quite accurate models for face recognition, emotion recognition. And a lot of this is becoming a product. So, we're seeing computer vision being used as a product, maybe in your phone. You might have a face ID that verifies your face against the database to unlock your phone. So, that's very exciting. However, we also have some problems with this technology with computer vision, but it also applies to machine learning in general, which is the problem of data set bias. That's what I want to talk about today. So, what do I mean by data set bias? Well, suppose that you're training a model to recognize pedestrians. And you collected a data set that looks something like this. And you train your neural network. And it seems to work really well on your held out test data from the same kind of data that you collected. And now you deploy your model on your product on a car. But now this car is in New England, whereas your training data was in California. So, immediately you see a very different visual domain with different weather conditions like snow, for example, but you didn't have in training data because there's not much snow in California. But also pedestrians will look different because they're wearing heavy coats and so on. So, all of a sudden the model that worked really well on your source data that you trained it on does not work so well anymore. And so we call this problem data set bias. We also call it domain shift. So, the problem of data set bias is essentially this issue that the training data looks different from your test data that you're actually faced with. It's different in terms of the distribution of data. That's the more general way of putting it, but you might qualify it, for example, as a difference between one city that you trained on and the new city that you're testing on. Or it could be a data set bias to images collected on the web, whereas at test time you are getting images from a robot, which also looks different. They have different backgrounds, different lighting and different pose. Another common data set shift that we see in machine learning is from simulation to real images. So here, for example, if you're simulating something for robotics and training your machine learning algorithm on the simulated data, it's not going to generalize very well to real data. This could also happen with demographics, for example, if your training data is biased in a way that where light skinned faces are overrepresented in the training data, but then at test time you are applying the model to darker skinned faces. Again, you will have a data set bias issue and the model will not work as well on the test data. This could also happen with different cultures. Let's say you're classifying weddings and you trained on Western weddings from Western cultures, and then at test time if you get an image of a wedding from a different culture, your classifier will not generalize very well, will not be able to recognize it. So there are lots of different ways that data set bias could happen, which is my point. Now let's look at what actually this means in terms of the accuracy of the machine learning model. So here is a very simple example, very famous data set called MNIST. Everyone knows what MNIST is. It's just 10 digits that are handwritten. So if we train on this data set, we know with modern deep learning we can get very high accuracy of more than 99% accuracy. However, if we train for the same 10 classes of digits, but our training data looks like this, this is a street view house numbers data set, now this model tested on the MNIST data set achieves much lower performance, 67% accuracy, that's really, really bad for this problem. And by the way, even when the data set bias is not as extreme, for example we train on the USPS digits, which look to the human eye look quite similar to MNIST, and yet the bias in the data leads to a similar drop in performance. And if you're curious if we swap and we train on MNIST and test on USPS, we have similarly poor performance. So that's just an example of how this data set bias could affect even in a simple case of digit classification could affect accuracy. Okay, now what about real-world implications of data set bias? Have we seen this in the real world? Well yes, I believe we have. This is one example that's quite famous now, which is the fact that in face recognition or gender classification, some researchers have actually evaluated how well commercial existing commercial systems from Amazon, from IBM, from other companies, how well they work, what is the accuracy they achieve on different demographics. And you can see here, according to one study, they work a lot worse on African-American and female faces than on Caucasian and male faces. So again, that's an enlarged part due to data set bias. Another very sad example of potential data bias is this accident that the self-driving vehicle was involved in a while back, the Uber self-driving car, which according to some reports did not recognize the pedestrian because it was not designed to detect pedestrians outside of a crosswalk. So if that's your data set bias, that in your data set all the pedestrians are on a crosswalk, then yes, your machine learning algorithm will not be able to recognize them as well if they're not in that context of a crosswalk, but maybe in this case, just jaywalking without a crosswalk. So you might be wondering, well, wait a minute, can't we just fix this by collecting more data? If we don't have pedestrians not in crosswalks, let's just collect more data like that. Well, there are a few problems with that. The first is that some types of events just might be rare, like jaywalking pedestrians, they just might be very rare events. And we may not necessarily want to force people to jaywalk, so we can collect more data. So that's one problem. But another really big problem is the cost of data collection. So imagine that we wanted to label images from cars, an example you see here, this is from the Berkeley BDD data set. Well, labeling 1000 pedestrians with per pixel segmentation labels that you see here where the label has to identify each pixel that belongs to that pedestrian, it's quite expensive. So it costs maybe about a thousand dollars per 1000 pedestrians. And now if you imagine the huge sheer variety of visual data that we want to cover in our data set, we want multiple poses, we want multiple genders, age, race, clothing, style, and so on and so on. And somewhere in there we want people who riding bicycles, maybe not riding bicycles or maybe riding tricycles. So if you think about how many different factors of variation we would have to cover, this very quickly becomes untenable and just too expensive to collect label data that's balanced across all of these variation factors. So what actually causes poor performance, right? You might be wondering that as well. Well, you know, can't my deep learning algorithm just get better, maybe I just need a better algorithm that will generalize and do better on test data. So there are a couple of problems that is caused by data set bias that current models cannot handle. The first problem is that the training and test data distributions are different. So here you have an example of two digit domains. The blue points and the red points are from these two digit domains. And you can see that when we visualize this data, we do this by extracting features from these images using the deep learning model that we trained and then plotting it in a t-snee visualization. So this is what we get. You can see that clearly the distribution of the training blue points is very different from the distribution of the test point. And so this is a theoretical problem actually when these distributions are different. We can show that theoretically they're actually bound on how well our model will generalize. Another problem is that a model trained on the blue points is not as discriminative. So the features it learned are not as discriminative for the target red domain. And you can see that because the blue points are much better clustered into different categories than the red points, right? So you just may not be learning good features for these test points that the target domain. So fortunately, there are quite a few techniques that we can use to alleviate this. I've listed a bunch here. What I want to talk about here today is the technique of domain adaptation. But you know, there's always data augmentation. There's always using sort of batch normalization. Some of these techniques can help in the case of data set bias. But let's talk about domain adaptation. So in domain adaptation, we design a new machine learning approach that tries to adapt the knowledge from the labeled source data to the unlabeled target domain. Okay? So our goal here is to learn a classifier that achieves a low expected loss under the target distribution. And importantly here, we assume that we have a lot of labeled data in the source domain, but we also get to see unlabeled data from our target domain. We just don't get to see the labels, right? Because labels are expensive to collect. So we assume that we do get to see some unlabeled data at least from the target domain. So what can we do? Well, the first technique, it's very, I would say fairly common now and fairly standard in the literature, which is adversarial domain alignment. So here, we want to take a neural network, which I'm showing here is this encoder convolutional neural network, because we're dealing with images. So we always use convolutional networks. We have some training data with labels. And now, so if we train this using regular classifier loss, we can generate features from our encoder CNN. And here I'm just showing it for two classes for clarity. And then the last layer will be our classifier layer. So we can visualize the decision boundary that it learns between one class and the other class. Now, if we also get to see some unlabeled data from our target domain, so let's say we put a camera on the robot and it can explore its environment and snap some photos. So now it has some data, it's just not labeled. And so if we apply the encoder CNN train on the source directly to the data, we already know that we'll see a data set shift like this. So the distribution of the target point will be shifted with respect to the distribution of the source blue points. And so in adversarial domain alignment, our goal now is to align these two distributions, the blue source distribution and the orange target distribution. So how can we do this? Well, a very standard approach is to add another piece to the neural network, which we call the domain discriminator. This is just a classifier that tries to assign a domain label to these input examples. And if we train it with a GAN loss with an adversarial loss essentially, then we iterate between the domain discriminator trying to separate the distribution. And then in the next step, we update the encoder in such a way that it can fool the discriminator. So the discriminator's accuracy goes down. And in the process, the encoder learns to align the two distributions so that if everything goes well, that the discriminator cannot tell the difference between the domains and these features have become domain invariant essentially. So that's adversarial alignment. And here's an example of it working for those two digit domains that I showed you earlier. You can see that in fact, after adaptation with adversarial alignment, the two distributions of the red and the blue points have now been aligned almost perfectly. And in fact, classification also goes up considerably. So it's not just that the distributions are aligned, it actually does improve classification accuracy. Another technique that I want to mention is alignment in pixel space. So what I mean by that is, suppose again, we have source data with labels and some unlabeled target data. And now instead of just doing adaptation with alignment, like I just showed you, what if we first translate our source data in image space. So we're generating new images from the originals. But now these new images look like they come from the target domain. So this is a similar idea of aligning the two data sets, but now we're aligning them in pixel space because we're actually generating the images themselves and not just generating features. So the advantage is that once we've done that, if we're able to train this generative adversarial network that can translate from the source to the target domain, now we have data that looks like it came from the target domain, but it has labels because the original data is from the source. So it's labeled with the categories that we need for training. And by the way, we can still add feature alignment on the feature space through this overall architecture. And in fact, we have experimented with that in our paper, which is at the bottom. So if you're interested, you can take a look. But if we do both feature and pixel space alignment, that can further improve our performance on the target domain. Okay, well, that's great. This pixel space alignment seems pretty neat. But so far, we've been assuming that we have unlabeled target data. In fact, what I didn't tell you is that in order for that method to work, it needed to see quite a lot of unlabeled data from the target domain. But what if we only get one image or a couple images from our target domain? Well, unfortunately, the existing methods like Cyclegan or Cicada that I showed you doesn't quite work. So instead, what we need to do is take a source domain image, which is their content, essentially. And we want to translate it to a new visual domain, but we only have one example, let's say, of that domain. So in this example, our content is a dog, and we want to preserve the pose of the dog. But we want to change the style or the domain of the dog into this other breed. I don't unfortunately don't know what breed of dog, maybe you know what breed of dog this is. But anyway, just have one example of this new breed. And so we actually propose a method that can do this. And this is the result. So you can see in the generated image, we took the original source image, and we added the style of the target image, but preserving the pose of the dog, right? So the content is preserved. So this method, we call it Cocoa Funit, was published recently in ECCV 2020. I'm not going to go through the details because I don't have time. But essentially, the model takes a content image and a style image, encodes it using a content encoder and a style encoder, and then combines these two encodings using an image decoder to generate the output image. Here are more examples. So we have the style image on the top and then the content image below it. And then the resulting generated image with our Cocoa Funit approach at the bottom. You can take a look and see that we're able to, even just using a few, sometimes just one, we've tried one or a couple images of the target domain here, the domain is a breed of the animal or a breed of an animal. So we can change the pose is the same from the dog, but the sorry, the pose is the same from the content image, but the breed is taken from the style image. So you can see how this is working quite well. And if you're curious, compared to the previous approach, which is called Funit that we're building on actually, we're improving on that quite a bit because as you can see, Funit is not able to translate images using just a single style image. It's kind of generating fairly poor results in this case. And on average, when we evaluate on a large dataset, we also see significant gain using our Cocoa Funit approach. So that's another example of pixel domain translation. One other example that I want to show you really quickly is using this idea for adaptation and robotics. So here we have a robot that's trying to insert an object into another object, let's say a peg into a hole, or is trying to more generally we can apply this to other manipulation tasks. And our input data is coming from the depth sensor. So it looks like this. So there's an RGB image. But what we're using is actually the depth image. You can see it in the middle here. But to train, so we want to train a computer vision model in neural network that will control the robot arm to perform the task. But to train, we want to use simulated images. So we simulate this kind of problem and generate fake depth images and train the neural network. But the problem that we run into is, of course, we have a gap between the training domain of simulated data and the target domain of real depth images. And so what we tried is using pixel level domain translation to solve this dataset bias problem without collecting any label data in the target real domain. You can see here an example, a real depth view image, and then a similar simulated image. And then the last one is we take the real and we translate it into the simulated domain. And you can see that it's now looking a lot more like the simulated data. So we're closing this domain gap. Okay, great. So I'm going to wrap up here just to recap what I talked about. So dataset bias is a pretty major problem for machine learning in general, but for computer vision specifically, that's mostly what I work on. So that's what I focused on. And I showed you a couple of ways that we can mitigate this problem using either feature space domain alignment or pixel space domain alignment. I also think, you know, we could discuss if we have time after this, some even more general ethical issues related to datasets. For example, recently, there was a paper that's generating quite a lot of interest that looks at the dangers of large language models, and points out that language models are being trained on progressively larger and larger datasets. So it's almost like the opposite of the problem that I talked about, where we have a huge dataset that we're training on. And now the problem that they're pointing out is that this dataset might contain all kinds of bad data like offensive data, or just, you know, even private data. And by training the model on it, we don't know what kind of biases or bad, undesirable things that it's learning, right? So that's kind of a related but different ethical issue. And this paper, by the way, is one of the co-authors is Timnit Gebru, which you might have heard that actually she was forced to leave Google over this specific paper. So yeah, so there are quite a few ethical issues, and I'm happy to discuss that or anything related to what I talked about. And thank you very much for your attention. Well, thank you. Thank you, Kate. And hopefully everybody enjoyed that archive, Kate. There's a couple of questions, Kate. And if you could just say hi, and so I know I can hear you. Hello. Hi, Kate. There are a couple of questions in the chat. We have about five minutes. So, and some of them look a little deep. But Alex is asking, when using techniques like Coco Funnet, doesn't it use, is it using generated data to reinforce the model to recognize content that is, that may never actually encounter? I think we talked a little bit about this in the talk, but the research is proving a benefit to filling in the gaps. Is that? I'm, yeah, I'm just still trying to understand the questions. So the question is, when using techniques like Coco Funnet doesn't use generated data, then reinforce the model to recognize content that it may never actually encounter? I'm not sure I'm following the question, so I guess one issue with using generated data is that it's not realistic. It's not, it doesn't look always like a real image, even though some of the generated data is fairly photo realistic, but not always. And then the other downside could be just not, not achieving a high degree of variability, as GANs tend to, like the big mall collabs, they tend to learn just one or two modes of distribution. So you could have low variability and generating data. So those are all definitely downside for training on that data. But, but the upside is that you at least are generating more diverse data than you had before, as you're augmenting your data with this generated multi-domain data. So of course it's always better to train on real diverse domains, but if you don't have access to real diverse domains, then I think this is a close second. Close second, yes. I love the example you gave the other day about the crosswalks and the cars and the crosswalks and not seeing that with the the car, the car, as the automated car driving example, that was a good one of not having enough, and that maybe some of the generated would help with that. There's one, one more question from Herbert asking if you have any examples of using this on structured data. Yeah, so there are some applications of domain application techniques to text, which is, I guess you could consider that more structured than images. And for applications like image captioning, for example, where your task is to take an image and generate a descriptive caption that says what is in that image, or perhaps answer questions, a visual question answering. These are these are the academic tasks I know of where there's more structure in the data. Because I work in computer vision primarily and natural language processing, that's what I mostly encounter. I think you could also apply this to documents because like PDF documents, because they have a lot of similarities to images. They have actually you can treat them as images, but they also have more structure because they are often generated according to some rules. So yeah, I think we could also apply the document. All right. Well, Kate, if you can hang out a little bit for online chat in the background. I really wanted to thank you for coming and talking about this because I think though you're doing all the groundbreaking research and academic side of things and a lot of the folks who are here on the call today are probably coming from the enterprise. And maybe we don't think enough about how to understand the data set biases and how to rectify them. So I think this has really been awesome to help inspire us to look beyond our data sets that we usually just use without thinking. So I totally appreciate you coming here today and doing this talk. So I'm going to now queue up the next talk, which is more on the enterprise side, but Paul McLaughlin is coming from Ericsson Research and he's going to talk about he got everything, every buzzword in this title, Sustainability, Machine Learning, AR, VR and 5G and AI for Good. So going off on the data set bias and going to the next stage of applying it. So thanks again, Kate for coming and joining us today. And again, we really appreciate everybody's questions. So thanks and keep those questions coming. So here, let me queue up. Thanks for having me. It was great to be here. All right. Thanks so much. And here comes the next one. Good afternoon. I'm Paul McLaughlin. I'm AI Research Leader and I'm part of Ericsson Research based in Santa Clara, California. Today I'm going to be talking about how Ericsson is using AI to help address sustainability and climate change. Because we know that climate change is real and having devastating impacts now. Humans have caused one degree centigrade of global warming above pre-industrial levels. And NASA and NOAA stated that 2020 was the second hottest year on record globally. Climate change is causing extreme weather events, which are the most visible effect of climate change. But the frequency of extreme weather like wildfires, droughts, hurricanes, tornadoes, thunderstorms is increasing in the United States. And in 2019, extreme weather cost $45 billion in the United States alone. This also has pretty important societal impacts because climate change damages hit low-income Americans and the South artists. And minorities and people of color are a disproportionate share of the climate change burden. The time to act is running out. So what do we need to do? The carbon law teaches us that emissions must be cut by half every decade to reach net zero by 2050. So by 2030, the information and communication technology sector can have a massive impact towards that goal. In 2020, 54 gigatons, which is a billion tons of greenhouse gas emissions, came from the ICT sector. So following the carbon law to avoid catastrophe, emissions needed to have peaked last year. And between 2030 and 2020 and 2030, we need to have a further 50% reduction in greenhouse gas emissions. And for every decades, following that until 2050. At the same time, we also have to invest in carbon sinks like forests to help capture some of the carbon we've already emitted. Action is required right now. Otherwise, the longer we delay, the bigger and faster reduction is required. Digitalization, though, is an exponential technology, which will help us address this target even more quickly. Ericsson research indicates that the ICT sector can enable reductions in global and greenhouse gas emissions by 15% globally. And this is based on existing ICT technology. More opportunities to go exceed that 15% will likely be enabled by technologies like 5G and machine learning and AI that Ericsson is investing in heavily. We see a particularly big impact on the energy, industry and transportation sectors, which I'll be walking you through some examples, as well as speaking to my own research on AR and VR and how that will help address greenhouse gas emissions. But the main point is a decarbonization solutions exist today. We don't need to wait for a silver bullet. And the estimated financial benefit of low carbon is $26 trillion by 2030 for reference. So we have an incredible opportunity ahead of ourselves. So Ericsson is leading the way and we are reducing emissions and impact of our company's activities, our products and services, and this also have a dramatic impact on society. And so our goal is to be carbon dioxide neutral by 2030, which speaks to our company's impact. This includes fleet vehicles and facilities that our goal is for 5G to be 10 times more efficient than 4G, which speaks to the impact of our products. Because 30% of network off x today comes from energy consumption and 90% of mobile network operator emissions are from network power. So for example, we are building a smart factory in Lewisville, Texas. We are pursuing lead gold and lead zero carbon certifications and 90% of the materials for that factory will be diverted from landfill. We've installed 1600 solar modules and we produce over a million kilowatt hours annually, which is enough to power 93 US homes for a year. We have water recapturing tanks so we can capture and reuse rainwater, which is enough for us to enough water for one US home for 133 days. This is an example of how Ericsson is actually investing to ensure that our products are sustainable and helping us show how manufacturing can transition towards a low carbon future. We also want to reduce the impact of digital networks. So the ICT sector's carbon footprint is estimated to be 1.4% of the global total. One thing I really want to point out, because I think it's remarkable and it shows how we are using technologies like AI today, is that emissions have remained constant while data traffic has quadrupled and the number of subscribers has increased by 30%. One of the main reasons for that is because we've seen big energy efficiency gains from the technology shift from desktop and laptop to handheld. But the ICT sector has the carbonization solutions that can get us to, they can help lead to a 50% energy reduction or emission reduction by 2030. So things like renewable electricity to power networks. The ICT sector today is the largest purchaser of renewable power, mobile network efficiency, where we can see Ericsson's leadership role in innovation. But we worry that energy consumption will increase dramatically if 5G is deployed, like 3G and 4G work. So Ericsson's technology leadership is breaking this energy curve. Hardware modernization can drive up to 30% reduction in power with higher data throughput and software can drive up to 50% reduction in power with no impact to consumers. This allows operators to decouple mobile data traffic growth from energy consumption and carbon emissions. We're also transforming transportation. So transportation emissions constitute 60% of the global total or 8.6 gigatons of CO2 per year. Commercial transport powered by renewable electricity is critical for decarbonization. And a robust 5G innovation platform will be required for this future for further development of this technology. A fully built out 5G network will be required to operate autonomous vehicles at a massive scale. So the challenge is how do how do we provide affordable and safe transportation and reduce greenhouse gas emissions? And an example of a solution of this is Ericsson, a Swedish startup called Einride and Swedish mobile operator Telia created an electric and autonomous transportation system that is safer and more sustainable. And the impact is that Einride says electric vehicles powered by renewables reduce carbon emissions of logistics and work by up to 90%. Autonomous driverless commercial vehicles also have less downtime more reliability and lower total cost of ownership and will also lead to better air quality. So how does 5G fit in? 5G enables higher speeds, lower latency and increased reliability for the network and capacity. We also think the digital divide is a critical component to sustainability as well because the digital divide is most pronounced and rural and minority communities. Today in the United States 37% of rural students lack adequate connectivity and this has really critical impact as schools are closed during the COVID-19 pandemic. So if you lack connectivity you cannot attend e-learning and according to Deloitte the digital divide currently costs the United States economy $130 million a day. So as an example of how Ericsson is tackling this problem the Rutland City Public Schools system partnered with Vermont Telephone and Ericsson and we installed next generation 4G and 5G wireless radios and antennas in fewer than 10 days. So Vermont Telephone delivered modems and routers which connected students to e-learning. Rutland City Public Schools delivered Google Chromebooks that have wireless connectivity and this happened in not in weeks or months but in less than 10 days and Holmes and Rutland now have wireless speeds well above 100 megabits per second which enables students now to access world-class education and e-learning opportunities and Ericsson is committed to this globally so we are partnering with UNICEF to make this possible globally for students around the world to really bridge that digital divide. We also think that 5G will help enable a transition to renewables so the United Nations says that by 2050 80 percent of all the world's power needs to come from renewables and this will help us get to that decarbonization that is critical for climate action. So the challenge for renewables to scale up is that there's a large number of power generators multiple solar panels and wind farms and bi-directional energy distribution power solds and purchased from a grid as needed and we have fluctuations in power generation because renewables can sometimes be unpredictable there may not be wind one day so the solution to this problem is smart grids. More renewables means that distribution system operators need total control of power distribution networks and distribution system operators need to respond rapidly to balance power production and load to avoid outages so the role of 5G is that distribution system operators de-digitalization and connectivity as key enablers in transition to renewable power. Distribution system operators recognize cellular connectivity offers lower CAPEX compared to cabling for grid communications and real-time power system management requires low latency communication connection and we can reduce interruptions by up to 75% with ICT compared to today's level according to a Swedish distribution system operator. Digitalization is also critical for the industrial sector so the industrial sector currently accounts for 32% of global greenhouse gas emissions and the challenge to decarbonizing this is an industrial sector needs to be consumer demand while cutting emissions by 50% by 2030 so business as usual is not sustainable and we have to transition from linear to circular business models which is what we think of as industry 4.0 and the role of connectivity and industrial process optimization is vast so by 2024 5G will cover 65% of the global population and there will be 4.1 or we believe there will be 4.1 billion cellular IoT connections and so that ubiquitous connectivity enables real-time measurement and real-time AI of industrial processes on a massive scale. The exponential roadmap shows up to 20% reduction in annual energy intensity is possible by real-time monitoring the processes things like AI and energy use and the AI itself will help us get to continual optimization of processes so Ericsson is using connectivity in our smart factories today in Tallinn, Estonia and in the United States to implement use cases to increase efficiency and reduce our own carbon emissions so we're showing how this can be done today but the role of connectivity is really critical in enabling this circular economy because it increases the lifetime of products and enables reuse for example 60 to 75% of energy can be saved by using recycled instead of new steel and material reuse needs to grow. Digitalization can track materials and products from manufacturing and reducing waste by asset tracking can really help during logistics as well so I want to pivot and talk about some of my own research because I was speaking to you a lot about how Ericsson sees tackling this challenge across the industry across all the industries we partner with and how connectivity plays a role but the team I work on works on augmented and virtual reality which are technologies that will help bring full experiences to people and we are thinking of this as it relates to carbon emissions the standability and I'll give you an example. Air travel today contributes to 2.5% of global CO2 emissions and just a single round-trip flight between New York and London produces 0.67 tons of carbon dioxide per passenger while a lot of travel is incredibly important it's something I personally love because I love to have the sense of being in a place the smell the taste the sounds of taste of food the sounds of the environment but a lot of travel today is to take a tour of a factory or look at a demo of a product or shake a person's hand so I can conclude a business meeting but what if I told you that we are working towards a vision using AI 5G and a lot of critical hardware research to enable people to have that same tactile experience from their own home I'd like to show you a video about that I get goosebumps every time I see that video so our vision at Ericsson research is that by 2025 we will be able to have advanced technology that will allow people to have full five sensory immersive experiences across a mobile network and we think our vision by 2030 is for people to be able to share things such as memories or thoughts using brain computer interfaces one of the critical challenges that we are trying to solve using AI is spatial computing so for us to have interactive content and experiences we have to use AI to understand the physical environment around the user and the objects in those environments and that means creating things like spatial maps and environmental understanding but also enriching those spatial maps with semantic information so not only do we know where an object is located or where buildings are located we also know what types of objects they are what the relationship the end user has with those objects and this will really enable us to create that full five sensory content and experience because once we have that information we can then generate overlays and so these overlays are critical uses for AR and beer so here as an example is what you might see through your headset when you go to pick up your rental car in the future so in order to place this overlay on top of your rental car with your return the price per day you know like we have to understand the object we have to understand the environment we have to do this incredibly rapidly because users can experience what we call virtual reality motion sickness if there is any delay greater than about 40 to 50 millisecond so this means we have to process data transmitted across a network or on the device itself and get a response within less time than it takes you to blink so that's one of the key and critical challenges that we are working on in my team why we're excited for the latency for 5g because that content placement is extraordinarily computationally complex and we worry that people will not have that same the same quality of experience unless we can have that computation at the edge but also to have the speed and latency for the algorithms for the network so that all the overlays the content the entertainment that you see through AR and VR headsets are correctly placed and are personalized for you this is a challenge though because it also requires AI it requires a mobile network it also requires headsets and XR headsets or AR and VR headsets today are evolving rapidly so today there aren't any commercially available headsets that have embedded 5g chips inside of them so that means that headsets and these experiences are not fully mobile yet if you'll forgive the pun AR and VR headsets cannot without 5g chips cannot push connectivity and data processing over the network unless they're connected to Wi-Fi so in that example I just showed you in the car rental pickup garage the challenge will really be that without 5g or network connectivity we may not be able to get to calculate that overlay of without unless you're connected to Wi-Fi once we have 5g chips inside of the headsets people will be able to take this level of computation and interactivity with them wherever they go and we also think that not only will 5g help address the mobility aspect it solves a lot of the technical problems or it addresses a lot of the technical problems that are inherent in spatial computing so for example one millisecond end-to-end latency is the standard for 5g and that dramatically reduced headset that dramatically reduced latency means that headsets can work with real-time data so that means as objects or the environment changes in the in the end user's field of view we can track objects we can correctly track overlays so that content and overlays in XR move with the environment and move with the end user and 20 gigabits per second downspeed 10 gigabits per second upspeed means we may not have to compress content or video as much so not only will you have content the reacts in real time it will look real as well because we may not have to compress it as significantly this will also really help with spatial computing because it will improve the accuracy and precision of environmental understanding algorithms like simultaneous localization and mapping we also are really excited about the possibilities of edge computing for spatial computing so pushing data processing to the edge of the network really will enable rich experiences and immersive experiences that are mobile as well and with edge computing one millisecond data travels at the speed of light so one millisecond means that an edge computing facility can be located upwards of 50 miles from the end user but we're also working to be able to think of how to make smaller edge facilities it can be located even closer to the end user which will really help us address that latency challenge for machine learning and AI so if we can for example think about how to distribute where data is processed that will really help us reach that latency ceiling that is critical for quality of experience for AR and VR and that 5G really means that the headsets and the form factors we will see are evolving rapidly so if we can offload computing into the edge of the network or across the network it means we can see and we are starting to see smaller headsets that have a physical form factor that is lighter and smaller in size once 5G radios are inside of these headsets we'll be able to process and experience AR and VR content outside of the home that updates in a real time with that incredible latency from 5G in the speed once we push processing into the edge of the network as well we'll see longer battery life or we believe we will see longer battery life because we will probably need fewer chips on the actual headsets we don't need to have ASICs that you consume quite a lot of battery so we will see people be able to wear their headsets all day long like they use their cell phone today and the key piece I think is the most exciting for me is around collaboration because without connectivity without 5G and frankly without AI as well people can't have a really difficult time collaborating so if we wanted to have a business meeting in person or look at a product demo it's together it will be a challenge to make sure that we are seeing the same thing at the same time and to interact with it so we can change things and collaborate together play games together watch entertain it together that's what the latency from 5G and then mobile network connectivity will enable is that collaboration and just to give you a couple of examples this is the Lenovo A3 so these are headsets that are commercially available today and we're already starting to see a dramatic change in the physical form factors and this isn't n real so we are seeing headsets for AR and VR that are starting to look a lot like the glasses I'm wearing today and that's our vision for how and our vision is that the internet of senses is coming and our vision as I said is for this to be have the technology in place by 2025 to enable full sensory internet and connectivity and so as you can see in this image we may tackle sustainability by needing removing the need to travel and meet in person so here we see a person having a business meeting with someone with a hologram and because of the placement because of the connectivity and latency from 5G that hologram is able to travel with the person you can share a secret and whisper and you can shake that hologram's hand and feel the weight of their hand so I really want to thank you for your time for listening to me the message I really want to impart you with is it climate change is real it is critical that we address it and every day that we wait the problem gets a little bit harder to solve but by solving climate change like Ericsson takes very seriously it's not a it's not a solution or it's not a problem that has no solutions using existing technology we can already get 15% reduction in greenhouse gases and we at Ericsson think we can go even further than that and we are really excited to be on this journey with you thanks so much and I'm looking forward to your questions. Hey Paul thank you very much for that thank you I love that because you did manage to get all the VR and AR stuff in there too and there's there's one question in the chat about asking you a little bit some of these same things that you're talking about here um what you're working on in terms of the healthcare space and 5G and AI is there anything you're working on and for healthcare applying some of these things I think that's a really great question and yes we are of course I think a lot of people when I mentioned that I work at Ericsson a lot of people think of us still as a company that manufactures cell phones so they had so many Ericsson phones back in the early 2000s and they haven't necessarily kept up with the company but so Ericsson we build most of the world's mobile networks and infrastructure so when you're actually using your cell phone or connected device you're connected to a network that Ericsson has built designed installed and maintained as well as all of the physical hardware and equipment for that so when we talk about healthcare when you talk when you hear people mention things like smart devices or connected healthcare facilities they're likely thinking about using our Ericsson built and designed networks but that's that's a bit of a wish you want she answer I would say probably the thing I'm most excited about is things like tele-surgery so particularly with AR and VR some of the use cases I talked about in the presentation you can also imagine that if we can have one millisecond end-to-end latency we could for example have a doctor and here in San Francisco where I'm based perform remote surgery with a patient in a rural area and that also has a real implication for the digital divide one of the themes I mentioned as well because we want to see ultra reliable ubiquitous 5G across the world across the country and so for example for that tele-surgery to happen we would need to have 5G and rural communities as well so you know it's all part and parcel but I think that there's a lot of use cases where we could even have remote consultation because over current or earlier mobile generations and mobile networks there could be upwards of a second or two delay between me taking an action here if I were a surgeon and a robot responding in a hospital and it's not really great in a surgery setting so 5G helps address that all right well we are already I mentioned earlier in the beginning that we're going to be fluid today and there are a couple other questions here and what I'm going to do is I'm going to read off the one that was asked in chat here and maybe you can answer that and then the other one you could answer online and chat the one from Rizan in the Q&A but there's one more question here about what are the political barriers to implementing some of your ideas for example if the electrical grid does not buy back electricity generated by consumers i.e. all the solar panels I'm putting on my roof right now there's less incentive to implement these ideas so you know have I know you think a lot about these things so I figure I give you a few few minutes. So I can't speak to the political barriers about the energy grid because I would be adding an answer but I can say that I actually lead a lot of our research related to trust readiness and security so I can speak a little bit more about that and in terms of politics we know that data privacy is extraordinary critical and something I take just we take seriously at Ericsson but I personally take extremely seriously so I think that we are obliged as a research community to have the strictest and strongest of ethics and everything that we do so we do start to see questions related to privacy transparency and accountability for all AI systems so particularly how algorithms would be audited how we would essentially particularly for the first speaker as well how we might measure things like bias and we start to see that intersect with politics where regulators are starting to take an interest in these topics and I think that this is something that we as at Ericsson research for about 800 people globally we take extremely seriously we have adopted a trust for the AI framework that's been accepted by the board of directors at Ericsson so it's something that again in terms of the space in the domain that I know well we are well ahead of what actually politicians are asking us to do in terms of accountability and in terms of transparency and privacy all right well thank you very much for your time today I'm going to queue up the next talk which is another enterprise application of AI and it's the enterprise neuro system initiative that's going on at America Mobile and I'm going to thank you again Paul and if you can answer that other question in the online chat that would be great and so thanks again and here we go with our next talk buddy and thank you for joining us today and I am very excited and pleased to welcome two guests to this presentation and conversation today and I'd like to turn to both Raul and Carlo and have you introduced yourselves and then we'll go ahead and do a brief presentation sure and hi everyone and my name is Carlo Pizeno I'm a core network planning director for America Mobile I have been in the company for 15 years now and my main responsibilities are related to the adoption of new technologies for the network I have been leading the NFP and SDM processes recently for the group and now we are very focused on bringing a 5G and all the cloud native principles around it so it's a pleasure to be here thanks for the invitation fantastic and Raul hello my name is Raul Reyes I am in charge of 5G infrastructure and cloud services of sensation I have been in America mobile for five years now and my main focus has been to enable and empower the different and distributed teams in Latin America so we can be always evolving and always getting more of the innovation in the operation fantastic and I want to thank you both I mean the partnership we've had with America Mobile has been nothing short of spectacular we've been able to do some very exciting things and over the years it's culminated in what we're about to talk about today so I'm very excited to present this let me go ahead and share my screen yeah let's see and there we go and so what I'd like to talk about today is the enterprise neural system framework and this is something we've all been talking about for quite some time and to set the stage I was in a conversation with Raul at a beautiful restaurant in Mexico City called Loma Linda and he turned to me over lunch and I told the story before so I'm sorry to repeat it but it was funny he turned to me and he said what is Red Hat doing with mobile networks and artificial intelligence and at the time I said absolutely nothing because it was still very early days we were still kind of in that assessment mode to basically understand what the impact could be and given the rigorous uptime requirements of mobile networks we were just kind of putting our feet in the water a little bit and Raul really pushed us right into the water with that comment because I came back home I reached out to a number of folks including Chris Wright our CTO and some other people and we started a small focus group to take a look at what this could eventually become as a community directive and so we've been working on a long time together and I'm very excited to discuss it today so here we go. One of the core things we've thought about over the years is just that human and IT architecture share a number of strong similarities and we just noticed this more and more especially with the advent of artificial intelligence and which really is kind of the completion of this parallel model you know when you think about it and you've got all these mobile devices they could be considered almost nerve endings they have the capability of hearing and you sound and visual identification and then data centers really equal the brain's functions in a lot of ways like the cerebellum and memory and processing some views and so what's interesting is there really is a kind of a parallel model you know we as a you know species have created something that's very similar you know in many respects and so in terms of the human body the more core operations are fully autonomous like the heartbeat, chemical levels, the way we assimilate energy and it still partitions conscious thought processes as part of that too so it's really almost like two separate sets of functions from that perspective but the higher order or the core decisions are made by the conscious mind which is really kind of firewalled away and coexist with these other systems in a real sense of harmony but also developed and home by evolution over many many years to say the least so we think corporations are kind of similar in many ways and there are many different functions in a corporation and it can span many different countries and so we thought over time that it would be interesting to tie together all these different data points and all these different functions essentially as a single instance and to make it all part of a single framework and that's where we have ended up today which is a new AI and machine learning telco community right now which is called the enterprise neural system and this is about AI infrastructure basically connected to every single business function across the enterprise and we're definitely starting with telco from that perspective but it will be applicable to all verticals because there are every corporation and the Fortune 500 is facing the same challenge and founding partners include America Mobile but also Verizon Media, Equinix, Ericsson Cove, Lambda Perceptilabs, Ernst & Young, Seagate and Watson's also involved and really why it's needed is that AI models are being built and deployed in both kind of a do-it-yourself fashion and through different vendors but without really a comprehensive integration framework or any kind of large-scale federation at the moment there are lots of small kind of point solutions and AI models being scattered around the enterprise and connected to data lakes etc but in terms of taking all those elements and all that information and cross-correlating it for a larger scale insight and deeper insight that's something that we saw the need for and why we're starting this community so it basically unifies and optimizes an entire multinational corporation at that scale with a single AI and ML framework and it enables like I said before the overarching cross-correlation of all these different data points but then what's interesting is over time edge and core AI all those instances become part of one system and it just provides any form of management whether it's you know mid-tier management or the C-suite with a real-time view of all operations and we thought about a lot of creative applications for that like a hologram advisor or robotic advisor you know like down the road but of course it would just be you know on screen for the time being but we're looking to the future to do some really cool and kind of fun innovative stuff and so conceptually you know if you take a look we've got all the core open-source components like Linux and self-scorage and Kubernetes etc and then we have the open data hub framework which allows you to use open-source AI platform tooling to create models and get them into production to maintain them and then also that would then lead into the AI neural system and so you would connect the neural system to IT and then it would basically propagate from there and connect to all these different areas like the finance area network operations facilities management legal and regulatory frameworks human resources I mean go down the list all these different areas would then be cross connected and integrated together to feedback all this data into the system and here's kind of a low-level architecture example and again just an example you would have AI and ML instances in all these different areas of operation network operations IT and the knock itself and what would then happen is quite literally they would then be connected so yet another kind of smaller and more I guess it's a streamlined group of AI and ML instances and they could be GANs they could be all sorts of different AI frameworks that would take the lower-level findings and begin to create a tree of logic basically or a tree of perception that would then take all that information begin to filter it and begin to draw out these kind of correlations that can lead to deeper insight and so over time you would have the same framework in every different business instance and it would then go up into let's say a second or third or fourth tier of different GANs or different AI frameworks into transformer frameworks or other AI frameworks because we'll be using and borrowing from a lot of different areas to create this and ultimately into the recommendation engine that would then basically convey the results and the observations and the insights to management and the C-suite and this would involve a federated intelligence model so you'd be taking all the different AI models cross correlating all their data creating a reporting intelligence that would basically then turn to management as I said before and relay all this information and again we would start with perhaps a dashboard on the left just as an example then on screen maybe some form of you know human representation and then eventually a hologram or some other form of intelligence that would convey this to basically their colleagues on the human side and so what's interesting about this too is we have found and actually MIT had discovered this as well is that the combination of human and machine is actually three x more powerful than either one alone so machines will have a certain error rate humans will have a certain error rate but together they actually reduce the error rate to almost less than a percentage and so in many use cases that we examined and so really what we're seeing is this kind of merging of the abilities of both sides of that coin into something that's actually greater and more powerful and so in terms of work streams that we're looking at different areas we'll have a series of excuse me open models that we'll offer we'll work on an open data platform and a middleware solution basically to cross connect all of this from an open source perspective we'll be looking at it through the lens of open AI ops or AI operations and this really could be considered kind of the marriage of business intelligence the classic way of taking a look at different data around the enterprise and drawing meaning out of it but also AI ops and the autonomous operation of the enterprise itself and how you can basically take all this together and understand it and that would be under the umbrella of the federated intelligence section which is right there number four so the way we look at this is there are really larger implications for global AI development and this would be kind of where we've seen those tea leaves begin to gather you know together in the middle and what we've noticed is that all these different elements do need to be brought together integrated and correlated and so there's really a lot of benefit from the enterprise and it's all the obvious things but through a real the widest possible frame of insight and being able to take in every single data point and understand what this all looks like leads to cost savings, streamlined operations and really it allows us to build a community sourced solution which is based on real production experience from folks like Raul and Carlo and a tailored list of objectives that we can all adhere to and then the good news is a lot of existing open source offerings and frameworks can be applied today there will be a few things that need to be created but in essence all the ground work has already been laid by open source communities in terms of the tooling we can use and ultimately there are cross vertical applications in financial services or oil and gas and all these different industries can take this kind of a framework and apply it to their own operations so it's actually a very exciting time for us and you know we're just getting it off the ground and we've already had meetings and I've got things moving so I'd like to now turn basically to Carlo and Raul and I'd like to ask you a few questions along these lines as well I think what's interesting is the fact that American Mobile got involved in this so early on is really exciting and the fact that you basically not only kickstarted us in this direction but you're also really embracing the open source I guess you could say methodology and way of doing things we think is wonderful so um you know I think really maybe you can talk a little bit about the value of collaborating in the open with your peers like Verizon Media Equinix and others I'd love to hear about like really what convinced you to do so to move in that direction yes okay well from a telco perspective we started some of the transformation projects in American Mobile some years ago adopting I would say a semi open approach but I believe we reached a point in which we discovered that we were not flexible enough so now I believe that the open source world has matured a lot and we're convinced that now with the industry trends around 5G becoming a reality I believe that it's the right moment to show that adopting this logic and contributing back to the open source communities is the right way to unlock innovation or future networks well yeah I totally agree I totally agree sorry for the interruption I and with Carlo mentioned that we think that the open source projects are now the de facto option no in order to solve big challenges no so today we see more and more and more challenges coming our way and it will be impossible for us as a single group to tackle all this constant change at the pace as we are seeing today no so so we are doing this because we think that the future of open source is promising and from the community the open source shapes the technological evolution and the creation of an environment that leads to constant innovation we think that if we do not do this this way it will be impossible for us in the future no it's really exciting you guys have been wonderful partners in that regard and I think it's been wonderful to see the industry support but what about like what about the technical value I mean what are the advantages of creating this kind of a multinational AI instance to manage and study your global operations in real time and to help you manage them what do you find to be the value from that perspective um well usually I think operators like us face very complex maintenance processes so one of the goals we have is around the processes optimization with the ability to take autonomous decisions considering a dynamic condition so in general adopting an artificial intelligence and machine learning logic will give us the advantage to reduce operational costs and at the same time reduce the failures in the network by having this predictive logic and as we have operations across most of the Latin American region we will also have the advantage to learn how to apply this methodology in similar scenarios in all of our outcomes with this multinational instance and common knowledge between all of the countries yeah sorry so we believe we believe that having the technology that focuses on predicting and managing the behavior of our operations will allow us to forecast more effectively no and also hopefully we we will plan the work assigned in in our notes and hopefully before every error or mistake occurs right so so before they happen no so machine learning will help us to learn faster as well no we think that this kind of technology will develop better solutions as well will help us to and leverage and bring better better solutions to our customers so we can maintain the stronger platforms in the times that not only mvps are needed because always the business is pushing to get more solutions as well but we need not only the mvps but also we need to have reliable and and and to have reliability at speed at the same speed of the of the business so totally agreed and i think that's one of the core values of doing something like this and you know i i think what you're really leading into is something we that was mentioned earlier was just the the combination of human and machine elements to basically create something more powerful from a cognition perspective do you do you feel the same way about that and do you think it's going to be that kind of an outcome in terms of your own opinions it just well i think that this is a process that in general requires maturity so i expect that initially human knowledge and integration may be needed but the systems learn to recognize situations and correlate them with the solutions and data prepared by experts so in the way that we feed these events into the solution the solution will know what to do and avoid risky situations for upcoming events at the end of course respect that the solution would have enough capabilities and intelligence to take decisions on its own and roble your thoughts well it's interesting how how AI and human cognitions are now collaborating in many ways as you mentioned before no i think that in one side the humans train and explain the machine learning models and also they maintain new and create new models no on the other hand i think that the AI brings more data and better insight so in a way AI boosts our human potential no i think that we can create opportunities for engaging technology in a whole in a whole different way well definitely so and to do something like this do you see advantages to do this kind of development and really an open source community manner as opposed to more of a proprietary or in-house approach like what would be the benefits as well in your opinions and yes well as mentioned earlier we started the following proprietary approaches with some of these transformation activities within the telcon environment however we have seen that these proprietary solutions won't solve at all what has been promised in the industry so the first thing that we expect is to have technologies and processes solving that that promise then we expect to have cost reduction in our processes and finally we expect or we believe that contributing back to the open source communities give us the opportunity to enhance the solutions and make them better all the time yeah yeah totally we think that using a proprietary approach will never give us the openness and the flexibility we need in order to build the effective solution so and open source communities from our perspective create more competition and when there is more competition the price is reduced as well no so the massive acceptance of a successful open source project is very powerful so so we need to provide a neutral home for it we need to protect it we need to develop on top of it without any any other risk of for this openness of flexibility that we are looking for that's awesome thank you and because this is a red hat event i'd be remiss if i didn't mention the fact that enterprise-grade container platforms could be very useful in that regard and how do you feel platforms like OpenShift and others can contribute effectively to this kind of environment well i think that OpenShift is one of the most mature solutions to enable a cloud native environment and we have very high expectations around it to provide all the flexibility necessary for 5G and other future network environments also OpenShift provides very good DevOps tools with smart life cycle management of containers through Kubernetes orchestration which give us the advantage to accelerate the development around the new trends for instance like network slicing and enabling for instance solutions at the edge of the network so definitely OpenShift is very valuable for us for sure for sure today OpenShift allows us to take the containers and put them in the right place it allows us to manage them shoot them down if we see any problem we are building today microservices and moving workloads from different clouds no but i think that there is so much to be done that we have not harness the full potential of these type of environments so so container platforms provide an easy and repeatable environment for that thank you for your partnership and your leadership in this area with us i mean it's been really exciting and worth it both of you and America Mobile on this initiative and with our partners and uh wow i'm just very grateful and just want to thank you both for your time today and i think we'll leave it at that but again thank you very much thank you very much thank you very much hi folks um and thank you Ro for joining us today and there's i just wanted to put it in the chat um but if you're interested in joining in this conversation around um around the neuro system and and other whatever the the name that you guys come up with for your community i know you're you're in flux right now i put uh lucky lucky for bill i put bill's email address in the chat um so you can ping him and he can add you to the mailing lists and everything else and get you into the next upcoming meetings and um there is one question in the chat and we're going to probably have to be pretty quick because we're running a little bit behind um and it volker is asking what are good first steps for other operators to get started with AI and eventually deploy AI based solution so um i think i'd like to try and answer that that would be great and um and then i will queue up the next talk i think the the the first steps uh are related to knowing what kind of operations would you like to automate right because first the first problem is that you need a better insight right so what information do you have it's observable to you is what is really actual visible to you so you can start getting better insights once you have a better visibility or or observability then it's going to be easier to start developing models and start making and taking actions on that no but i think the first step is to bring everything visible to observable and to have an holistic view from end to end per service not per platform it will be more like a end to end on on a service on a service that you will be i don't know if i explain myself correct i think i think you i think you covered it um i'm going to let you answer stay in the chat for a little while online and answer any other questions that come up and i'm going to now um queue up the next talk with paul mclough um not um david canter and peter matteson and diane fedema which is another panel um and another set of conversations and so um give me a second and i'll queue up that video and thank you for our room for um for making this happen and we will um we'll get this all working so hi peter hi david um gonna run the just one one thing i would like to invite everyone to to join us to the to this community and these things that we are building together so please feel free to contact us and we will be very happy to have to have you in this framework and this open initiative that we are building together sorry thank you no definitely what i what i will do is i will send out once you guys have a landing page and your name figured out i will definitely send a note out to all the participants um and let them know about it so um i'm really grateful this is this is the entanglement bit that i talked about earlier um there's everything is fluid and communities are like i like to think of them as jellyfish that all interconnect so um we are now um now really pleased to bring on um the ml commons team and i'm just going to queue up their talk here and let it rip we're pleased to be here today at the open ship commons gathering um and the topic today is data science and uh i'm diane fenema from red hat i work in the ai services team i'm here with david canter and peter mattson david is the excuse me the executive director of ml commons and peter is the president of ml commons general chair of ml perf and staff engineer at google so peter and david recently launched ml commons and we invited them to provide some background history on ml commons and ml perf and i want to say that red hat is really excited to be one of the founding members of ml so uh to get us started tell us a little bit about your backgrounds and some of the work that you do so um i'm peter mattson um i run ml metrics uh for google i'm interested in majoring all things about ml and uh before before that i i studied compilers at stanford i worked with a startup called stream processors and we did video for a while um lots lots of different opportunities to try and make complex things go fast as it turns out that uh used to be an eternal need so i'm excited to be trying to do that for for ml and also try and make it better as we we push forward things with ml perf and ml commons david uh yeah david canter uh and so pre ml commons uh i spent a lot of time in computer architecture i actually uh started a microprocessor company that was sort of doing a fusion of uh compilers and hardware design to exploit more single threaded performance and then after that i ended up consulting with a number of companies one of which was cerebra systems which is now uh like red hat a founding member of ml commons and that's sort of how i got involved in this and i actually have a little bit of uh background in benchmarking uh which kind of came in handy and is part of the reason why i got involved and it's just you know it's it's very exciting to be able to build this kind of an open community and we really do appreciate that the the role that red hat is playing so i don't know how many users are aware that ml commons originated ml perf what led you to start ml perf peter i and um uh what were its goals and how did it involve into ml commons sure so um about um about three years ago um we were looking around at uh ml um and in particular ml hardware in google and trying to understand um you know how fast were different options um and we decided that we really needed to have a um a good ml performance benchmark and there did not seem to be an industry standard solution for those um so we uh rounded up um set of usual suspects anyone we could we could find that we thought had done uh the wrong work um so folks like uh great dms from bydo who did deep bench stanford don bench folks uh matai zaharia and and peter valis um and um the found them folks uh from harvard um and uh got everybody in a room and uh uh put forth the the the challenge like should we should we try and come up with one benchmark for kudis to make sure training for and everyone thought that was a great idea so we came up with a set of rules um brought in a bunch more folks from industry um broad players um like nvidia intel startups like uh supramorous uh which is like how uh david got sucked in um and the benchmark uh really took off um we had our our first set of rules out in the middle of uh 2018 um and then uh results by the end of that year um we've had some around since then 2019 was a big year of growth we got into inference um we got into uh hpc 2020 we continued to expand and we also um started emerald comments sort of the um the the driving function behind that was we were looking around for a home for emerald wanted to put in a non-profit organ but we wanted something that was engineering focused and ml for the open engineering and ml and we couldn't find that particular combination we could find large organizations with uh like linux so we were very um focused on open engineering in general we could find some that were were focused on ml in particular um like uh nurse but they were more event oriented and so we decided to start one um you know we wanted an organization that that was their their reason for being was to try and come along and make ml better and we we put uh ml perf into ml comments ml perf is still very much going strong and growing but it it now has uh oh we also looked at the field of ml and we feel like it's a it's a very young industry right it really has a tremendous amount of needs to mature as a field it needs it needs uh you know the same things that drove sort of the industrial revolution great ways of measuring things you need good raw materials aid in the case of ml and and it needs good ways of making things standard ways of making things you know a shift from doing things in your basement to uh you know assembly line production at high quality and we wanted to see whether we could form an organization that would uh answer that call and try and provide those things and really move the field forward yeah that's great yeah so so that's sort of the you know the driving motivation and i think we kind of ended up with three key pillars that we like to talk about you know and that would be the the benchmarks and metrics which you know we've talked about ml perf as well as uh building large open data sets which we think are another key ingredient towards really democratizing technology right and you know the same way that open source really has enabled and fundamentally transformed like the art of software whether software as an art or or as an engineering it's just you know utterly unrecognizable compared to 30 years ago and uh you know sort of the the analogy is that that data is sort of that same raw ingredient that you need to to start building up machine learning and uh you know the the more large and open data sets we have the more folks are able to extend ml capabilities and use them in products and extend those benefits to the whole world right um and and the third pillar is uh best practices and i like to think of this as removing friction right and and uh or or perhaps you know the transition from sewing your own clothes to you know having a uh an abstracted assembly line where there's a real flow and uh you know today with ml there's a lot of things whether it's model portability or just you know even deploying a model is tremendously high friction but if we want ml to become pervasive we need to drive those sources of friction down so that you know maybe in the future doing things with ml is almost as easy as you know grabbing a library off of github and then you know looking at the comments and maybe asking some questions i'll stack overflow about willing it together like that's a future we would love to go towards and we are very fortunate that uh you know when we went out and started talking about this vision you know it really resonated with a lot of company you know redhead is a founder we've got about 39 uh companies that are founders and a total of over 60 members so some of those are individuals like myself or or academics associated with universities and so we've really built this just tremendously vibrant community to focus on advancing innovation in machine learning and kind of extending those benefits through all of society and it's you know very much organized in in the principles of open source right we're very open we like to move fast and iterate okay great so um are most of those members then hardware companies can you just like give me a little bit of a breakdown there sure yeah so we absolutely have a lot of hardware companies uh you know peter named named a few like uh intel and nvidia as well as you know startups like sentient and so forth but we have a number of cloud services companies and software companies as well we really see this is a big tent where there's a lot of folks who can play uh you know to name an example of you know sort of a more purist software company in some sense vmware is involved there are a number of ml software companies and then a lot of uh uh cloud providers who provide computing services in in one fashion or or another um as well as you know sort of very ml focused uh uh companies there's a couple startups that focus on replicating replicating experiments and things like that that are engaged so it's a really lovely uh and diverse community and also across all geos um this is both a blessing and a curse um for those of you in distributed organizations you know the challenge of finding a time that works for folks in asia folks in europe and folks in america which is there is no such time but you know it's great to have such a diversity of participation can you give me some examples of projects that are going on in ml commons absolutely um so i'll probably start off with uh you know one or two uh so the ml perf benchmarks are pretty well known but one of the things that we are doing is trying to sort of grow the footprint and move into you know some some new areas that that need attention in terms of ml we started as as peter mentioned with training uh i got involved and helped to lead doing inference benchmarks and then one of the things that we branched off to do was to start focusing on mobile phones uh and ml in that context and then there's some efforts that we have around you know sort of the internet of things and tiny devices that's one way that we've been expanding uh with different projects in the metric side and then one of the things that i think you know actually you know brought us together you and me most literally was ml cube which is one of our best practices right and that is a uh that is a set of conventions around containerization that help you sort of abstract the machine learning away from all the other pieces of the infrastructure and you know i like to talk about this in terms of both portability and reproducibility and one of the examples i give of how this can help is when i think about a game changing innovation like bird it was first published as a paper by google and there's probably some code in tensorflow but if you wanted to wrangle that and try that with your customers you might spend a month or two doing it and you know the vision is that maybe one day we can get that down to a day or so or less or maybe even hours so that you know if you want to use an innovation at amazon or facebook and and try it out on premise or in a different cloud all together that becomes frictionless and i remember you know one of the first things that that brought you together with us was you were working with some of our benchmarks and trying to get them to work on on red hat and you know it was it was a bit of a struggle and so in some sense it was born out of that need and desire um and you know we also have some data set projects and i'll let peter talk about those as uh as david said there's three big pillars for us which are benchmarks best practices and data set i think in many ways uh data sets are the new code um they are uh the way you express what you want your machine learning uh product to do the models are in some sense a lossy compiler for them and one of the key kinds of data sets that really drives innovation in the field is public data sets you think about what image net has done for the field right that that cost something on the order of three hundred thousand dollars to build and uh arguably it's created modern machine learning we can't build performance benchmarks without good data sets um you can't do good academic research on anything without a good data set and a lot of the data sets we have now that are really best for their task and they were kind of created haphazardly um you know an academic group needed something specific they created the data set and then moved on and there's there's a data set out there usually you know a very modest size compared to what's actually industry um often under restrictive uh licensing terms um and it's it's not growing and evolving with the field and so what we would really like to do with ML Commons is create a a center of excellence for public data sets a group of people who are really excited about making sure they're a good public data sets out there that are growing and evolving to suit the needs of the field but actual data sets for instance we just announced the people's speech the largest publicly or soon to be the largest publicly available speech data set by order of magnitude that includes a diverse range of languages i think it's over 60 languages um more diverse range of speakers than what's available now we really want to push that forward because uh you know that makes uh speech detected text technology accessible globally if we can get this right um we're also looking at uh potentially data sets for recommendation systems which are incredibly important industry um and potentially uh a framework for doing very privacy protecting um medical uh data sets for accuracy validation people looking to say will this model really work in clinical practice we've got a wide range of projects we're looking into all around this sort of central theme of make good public hey well that is great so if someone in the audience right now is really interested in getting involved and you know in one of these areas that you've discussed you know i'm just wondering where do you need contributors right now and and how could they go about getting on board and helping out yeah so uh first of all you know like like most open source communities you know we we really uh love folks who show up and in fact you know i uh just to give you an example of that uh i originally showed up through a meeting at i think the stanford faculty club one of our early ones that was posted through a call on the comp.arc usenak uh right and and i showed up and you know eventually i did so much good work that i got punished and they made me executive director right take that we are an extremely uh open organization um so if you go to our website to ml commons.org there's a page about getting involved it lists out all of our working groups uh you know i we talked about like three or four projects but there's there's over 10 different working groups you know everything from focusing on low power embedded benchmarks to logging to algorithms and so each of those working groups we have chairs diane you are you know one of the chairs for ml cube and so if you go to the page on ml cube you'll get to see uh you know a bit about diane and what what the project focuses on so you can look through those and uh you know we are uh uh opened individual members and and many of our projects are open source in nature so you know you can stop by github sign the cla and uh you know if you see some bugs we always love those getting fixed and uh you know i i think again like a lot of open source communities it's something that uh you know you get as much as you get right it's it's the potluck model and and so i think there are a number of folks who have kind of wandered in randomly and found that it is uh you know uh really fits their interests uh some of the folks on the data set side are just phenomenally passionate about speech and this is just you know a really wonderful thing that just aligns with what they want to do so we'd love to see more folks getting engaged but both from industry and academia we have uh quite a few faculty already involved and and we'd like more we'd really like to maintain that balance and and just a a community that's that's really open and and wants to push innovation though before okay fantastic so and then if you want to get like the links and things go to ml commons.org is that right yep okay yeah i think it's a great group of people very friendly group so glad i joined it and uh so thank you so much for being here today and talking to thanks for having us it's been great yeah thanks for the invitation and and also you know thanks for all of your uh contributions to the community as well you know it's it's uh it's been uh uh a great partnership yeah been a lot of fun thank you i'm really sad i i saw your message i'm just i'm really sad that you're leaving us soon oops there we go we have a couple of glitches in the matrix but um welcome everybody thank you very much for for being here today and this is again another one of those entanglement moments where we're just seeing a lot of cross community collaboration so there are a couple of questions one quickie was how do i download the people's speech data set if i want to try it out um so the people's speech data set is currently in review um we don't want to put out a data set until we um uh thoroughly square the content and make sure that it really does make a difference in terms of uh of quality um so right now um some of the ml comments members have it and they're they're reviewing it but we will be uh making a presumably slashy uh public announcement we have it a bit all right awesome good so if you want to really ask us uh join as a member and we'll uh or that's always the great enticement so um one other question that's coming is what are some exciting new applications of ai and ml that you see when working with researchers to consider new models to include in ml perf you know diane that's one of your things um one of you can take that on i think you're muted diane let me just see if i can unmute you i think peter would be great for answering this because he's a community yeah so diane is making it easier for us to take on new models um which i think is that a fantastic contribution that's the set of whole push at ml or ml q site um the interesting new models that we're really pushing forward with are things around uh automotive and medical and uh trying to push uh sort of cutting edge uh nlp for it i think those those those three are pretty exciting for us right now dip it under a theme i want to ask that um no i mean i think that's a uh a pretty good characterization i think you know some of the other things that that are interesting to explore our uh you know good good benchmarks for reinforcement learning uh but then i think one of the other things that there's sort of an art to is you know you have to pick benchmarks that are forward-looking but not too out there and so i i think one of the things that it's incumbent on us to do is sort of be scanning and almost be the advanced path finding for some parts of the industry right not not everyone really moves at the same speed you know google has google brain which is fantastic you know a huge research organization but i i think that part of the goal of ml commons is to be in the vanguard of doing things like this you know the people's speech data set for example when that comes out uh publicly may unlock a lot of new applications in areas that that we weren't thinking about the obvious one is speech to text but there's many other things that can be done all right then now um i know we're fluid and we're always behind that stuff and so if other people ask questions ask them in the chat using the q and a window is there any last minute things diane peter or david that you'd like to add in or people too that you might have missed in the answers i think just to re-emphasize that we're a rapidly growing community um with i think a really mission um and uh we'd love to have more members who are excited about uh datasets benchmarks or sort of best practices that's the title together ml commons.org we'd love to have you all right well thank you very much for taking the time today and joining us i'm going to queue up the next talk on open data hub with audio res resdick and we'll just talk a little bit about the uh this reference architecture and this joining together of all the pieces and parts into one um wonderful offering um at at open shift and at red hat so we're really thrilled about it and we're glad that you all could join and you'll listen into this and then um slowly we'll get back on track for the time so here we go thanks guys yeah thanks yeah thank you so hello everyone my name is audio resdick i'm part of the red hat open shift data sciences team as a data scientist i've had the pleasure or maybe the torment of delivering ai ml models to production so in a world of jupiter notebooks terminal servers get lab runners s2i containers and open shift you don't know how glad i am to have discovered open data hub in this presentation i'll give you a brief kind of background on how open data hub started what open data hub is and hopefully i'll be able to have enough time to conclude my presentation with a quick demo it's not going to be a live one it'll be with slides but the demo will show you how to go ahead and deliver on ml model which dwells kind of into the world of fraud detection history on how open data have started it started internally within red hat as a platform on open shift for data scientists to basically go ahead and store their data and run their data analysis workloads hence kind of the phrase data hub and fairly early on it was realized that data scientists and data engineers requirements for tools and really anything to do with ai ml components were pretty different from dev ops requirements so data scientists i can test to this as a data scientist are mostly ui driven we really avoid using terminal commands and we expect the tools any of the tools that we use to include our favorite ai ml libraries that we're accustomed to using now collaboration and sharing is also a very important requirement for our workflows to successfully be able to be delivered to production so the main points of kind of sharing machine learning workflows done in notebooks and moving a model to production and managing the mode while in production monitoring it making sure that your predictions are accurate watching for any data drafts resource usage dpu memory and whatnot those are all very important to us and these are things that were combined together as multiple tools and components to kind of obtain an end to end ai ml platform hence we have this open data hub being not a single application but really a platform with multiple tools running on open shift okay so open data hub is really how red hat does artificial intelligence and machine learning internally on open shift and we've learned quite a lot from running machine learning workflows on open shift we faced kind of still face a lot of challenges and issues that we try kind of resolve and provide solutions in the open data hub issues and challenges there maybe three or four uh first of all is the people in the ai ml projects there's always a team of data scientists data engineer devops product owners business developers that need to collaborate and work together secondly there's sharing and collaborating around the ai ml development is difficult sometimes most of the time it can be manual and really can be error prone thirdly another important challenge are just the computer resources themselves ai ml workloads are compute heavy and cpu memory storage are not unlimited resources I think we all know that um and definitely they're not unlimited resources in any uh development or production environments that we're working with and fourthly which is the final challenge and one that is very critical is delivering to production and the production development lifecycle sometimes that's not as easy as it sounds so today open data hub internally runs uh ai ml work workloads uh such as application logs so in our uh internal open data hub clusters we run anomaly detection on multiple red hat application logs we have cluster metrics we gather and analyze the cluster metrics or sorry the cluster logs from open shift clusters and we have an ai ops team dedicated to finding or predicting any issues that may occur there and finally we have customer support data so on our customer service side we store and analyze uh any of the so uh reports customer feedback and many other different types of customer data so we've kind of gone through the history let's go and take a look at really what is specifically open data hub so open data hub uh first and foremost is an open source project driven by an open source community it's a collection of tools and components that make up the end to end ai ml platform specifically on open shift the ai ml workflow starts with prepping and basically transferring the data into a data lake or storage and making it accessible accessible for data scientists when we look at what the data scientists do um we're really looking at the next phase which is model development and what we're doing is we're looking at the data analysis of our data picking certain features uh going ahead and creating a model going ahead and then training and then doing some model um validation the very last phase goes into the dev ops realm the dev ops realm and that's really moving and serving the model into production this phase is not kind of a static one stop uh model serving delivery phase but it's a constant optimization phase so the cycle of monitoring optimizing and serving is a constant cycle that happens really for the lifetime of your model and again at the end of the day it's that collaboration between your data engineers your data scientists your dev ops and any of your business developers that you have so next what I wanted to basically show is show you is just a diagram and show you where you can actually find open data hub so first and foremost open data hub is an operator that's installed from the open shift operator hub so you see that I have an open shift screen here and we can go ahead and then choose the open data hub and when you look at the open data hub you're able to see that there are various tools that you might be able to use so if you're a data scientist you'd be very interested in using jupiter hub maybe for some of the business analysts you might be interested in using grafana to take a look at some of your results from the model that you've deployed now open data hub integrates open source projects into as I mentioned an intimate and AI ML platform on open shift so we go ahead and we take all these different open source projects such as kubeflow and we adapt them to run on open shift and we package them basically within an operator and then we go ahead and offer it on operator hub so of course kubeflow is pretty big and the and the central component in open data hub and we add other components and you can see them on the screen there we add things such as grafana spark prometheus jupiter hub kafka etc so this slide here really shows all the different tools and components that are provided by the open data hub platform and it just addresses basically a specific functionality in the end-to-end AI ML workflow and again this is like very similar to the slide that we saw just two slides ago where first of all we focus on data analysis we have storage integration which could be our self-storage working with postgres sequel or my sequel we have to have some way of doing data exploration so we might use supersetter hub if we're interested in our metadata we might have something as hive meta store then for big data processing we may use something as spark those are things that the data engineer and the business analyst are very interested in then we move on to the artificial intelligence and machine learning so the data scientist domain but a data scientist they may jump into an interactive notebook such as jupiter go and go and do some of their work in there if they want to go ahead and train fine tune their model or work with a distributed model they may use something as pytorch they may use something like spark for machine learning applications themselves there might be various libraries that they're interested in in that case they can use the open data hub AI library and then finally they're going to go ahead and look at how they can deliver some of their their services for their model or deliver their model through kubful pipelines or maybe airflow that brings us to the production side where we're going to go ahead and deliver what we've created to the dev ops engineer so again when they're looking at the model serving they may use something as seldom way to deliver some of the services again might be using something pipelines such as kubful pipelines or maybe argo and then finally if we want to actually take a look at what's going on with our model we'll use some sort of monitoring tools such as grafana or prometheus so the open data hub comes with an ecosystem and again this is provided by red hat and certified partners and basically to help enable our customers we built this ecosystem around this open data hub and we feel that it provides our customers with a faster go-to-market strategy so if we take a look at the product integration this ecosystem provides tools for tighter integration with red hat products such as red hat open shift um self-storage open shift service mask mesh we can go all the way to red hat three scale api management to actually get help with some of those items we do have red hat consulting engagements so as part of the ecosystem we have that dedicated ai ml consulting services team to help our customers succeed in their digital transformation efforts or plans and really accelerate their their time to market with what they're trying to do very important part of this is our red hat certified partners we work with their party vendors to get them certified to use ubi images and certified operators then these partners become certified partners that will provide support for their tools integrated with open data hub and we could look at some things such as cell done or anaconda anything that we might use for for model serving etc and finally we have industry use cases so basically to go and showcase these integrations we've built multiple industry use cases showcasing how we're using open data hub integrated with the red hat products again such as um fraud detection with open data hub and the red hat decision manager so what i'd like to do is just give you kind of a slide demo to show how i would go ahead and do some fraud detection within a bank to give you an idea or flavor of how you can work with open shift and open data hub to actually deliver your solution so the first thing that we're going to do is just basically log into your open shift account from there we are going to go and proceed to the open data hub dashboard we're logged in as a developer and to do any of the navigation we would use the left panel navigation bar so right now we're looking at the topology so i would just proceed to the open hub dashboard by clicking on the odh dashboard operator and then click the open url button what'll happen is we'll be presented with some sort of open data hub screen and we'll have a large choice of options to choose from as i mentioned odh contains a number of tools that you can build and manage and deploy your models we're going to take on the role of a data scientist and work on a fraud detection model so what we're going to do is click on the jupiter hub card to open jupiter hub and go ahead and begin programming so when we open jupiter hub we're first going to have an option to determine the type of notebook that we're going to use we're just going to use a basic machine learning workflow notebook that we can use to deploy a fraud detection model and again just a reminder we're looking at legitimate and fraud fraudulent transactions that are in a bank so we would go ahead and just accept the other defaults and choose spawn to continue i've actually gone ahead and pulled in the notebooks through a get repository so in this case when you go into your jupiter notebook and you pull in your notebooks you'd be able to see them and in this case we have some of our feature engineering and model or logistic regression and services notebooks that we use to deploy our fraud detection model when we put the model into production we actually go back to the openshift side and we use pipelines so we're deploying the machine learning pipelines into production with openshift pipelines and we'll see how we can use the services to make predictions when we go back to the main openshift console and select pipelines you'll see in this case there's a pipeline that we've already created so what we do is we could click on the pipeline and see the pipeline details and remember this pipeline is going to help deliver our models or our model so once the pipeline is finished we have a model or a rest service that's built with source to image or s2i and at this point what we'll want to do is take that pipeline service more specifically the url because we're going to be using that url and you'll see at the bottom i have a service url such as pipeline operator data hub user one etc etc and we'll be using a request library in python to interact with our rest service that we've just managed to deploy so if we go jump into the jupiter notebook to interact with our model services we'll go ahead and just replace our default host with that generated url from those pipeline services that we have running then if we go ahead and run our services make that request and then run our model we'll have the model making its predictions in this case we have a lot of legitimate predictions on the right hand side of the screen you can see under predictions that could mean that we're very good uh if we as we'd run this model a little bit longer we'd probably see some fraudulent predictions coming up um all in all that looks very good so then what we want to do is we want to actually go and take a look at graphically uh what our legitimate and fraudulent detection transactions look like over time so we can go back to odh and we would launch grafana we would log into grafana and then we would get in touch with the pipeline service that we had running and then we'd be able to visually monitor our service for fraudulent and legitimate transactions i apologize for the screen being sort of or the screen capture being sort of fuzzy but what that's doing is um what it's showing you is over the course of a day the number of legitimate transactions which can um should be a lot larger than the fraudulent detections which we are detecting so through that very very short slide demo you have the ability to visually what about that now we have the ability to visually monitor our service for fraudulent and legitimate transactions and that's all going through using the open data hub services where we were able to deploy a jupiter notebook and go ahead within that jupiter notebook and at the end of the day get our model running and then go back into open data hub for another tool that will allow us to actually see some of the services that we have running from our model in the back end and that concludes my demo and my kind of recap on open data hub i hope that you found this useful and i look forward to answering any questions that you may have well um thank you but adi's not going to be answering any questions that you may have as there's been a glitch in the matrix and though she was wonderful in giving that presentation she's not available to answer your questions so instead i have charade griffin with me um who is the head of ai services at red hat and um if you have questions if there's one i can see in the q and a here now i'm just going to go down and check it out um that just came in and um i think it's more of a technical question there while he's so i'm going to hang that one up for a minute um and maybe ask a bit of a softball question um to charade and while i look for an answer to that one um could you tell us charade a little bit about why we created um open data hub at red hat um and you know what did we see the customers you know what did the customers what were the customers asking us that made us see that there was a need for something like odh ups and there we have another glitch in the matrix we have now that's um my sorry i have too many mutes have my headset and my laptop muted uh that's actually a very interesting question diane and um when we look at when we first started engaging with customers and talking to them about their about ai it was very it was it was a very clear message that they wanted to continue using their investment in open source and red hat technology uh you know there's a big movement behind open source open source is the driver for a lot of the ai innovation and instead of customers having to have specialized hardware or specialized infrastructure to do their ai they wanted to see ways in which they could leverage what they've already invested in um but in order to do that red hat is very much a plat it's a it's an infrastructure company first and foremost uh we knew we couldn't just start to get into the ai space with our own vision on and and not just not look at what's going on with the broader ecosystem uh it's very important with red hat that we work with our partners that we work with other open source communities to be a part of something bigger than just a single technology and so open data hub allowed us to do exactly that uh we knew it it had to be what's the bleeding edge technology in ai right now how do we bring that to open shift into kubernetes to allow for more scalable workloads for our customers and then do it in a way that didn't isolate our partners that helped us get to where red hat is today you know how do we continue to work with uh partners whether that's uh on the infrastructure side you know we work with some partners that are in the ai space like a profit store for being able to optimize uh how your infrastructure is run all the way through new partners that we've identified that help us tell better stories about managing your data or even just the ai ml in general you know partners like claudera and sass and h2o and anaconda so taking that same mindset of how red hat has operated its business for the past you know 15 20 years and applying that to ai was really how we ended up resolving on open open data hub and and why we felt the need to be able to highlight that you can continue to use open source not only open source applications but also open source infrastructure to help solve data science all right well thank you for that and thank you for stepping up to the play here if you have other questions while lead i will get back to you on those two issues that you posted in the chapter and actually respond to that because yeah so while we can certainly follow up offline as well we are really working hard on the disconnected install this is we identified it is two different things and for those not familiar when we when we talk about looking at technologies like the edge or even for some some industries like financial services where it's really critical to be able to wall off your your infrastructure or certain parts of your infrastructure so to decrease the vulnerabilities you know we're looking at ways in which we can install open data hub and some ai components in a way that allows for that disconnected nature when you don't have access to the outside internet so right now we've been meeting over the past couple of weeks to really come up with a stronger story of how we're going to answer that from the engineering side we already have someone that's fully dedicated for the next few weeks to specifically solve helping to bring disconnected installs to open data hub but it's also a broader topic because we're starting to look at ways in which the platform and the infrastructure needs to be able to enable that out of the box without having to do special things on the application layer we don't want we don't want all of our ISVs and our partners to be able to have to do something special for disconnected installs so there's multiple things going on there I guess a roundabout way to say from for the short term our engineering efforts have kind of refocused on solving that and we put we put a full-time person on that but then we're also having deeper conversations on the platform level to support the use case in a broader sense all right well thank you very much for for stepping up and for having an answer at hand on on the disconnected install one so that was that was great so I'm going to queue up our next talk which is the from Verizon Media and Ganesh here with us today and we've been really pleased with the collaborations that have been going on with Verizon Media and looking forward to this and it's a nice way to segue into using all the pieces and parts of of the different AI space and data science initiative so I'm going to start that one up and let it rip fantastic to be here at the OpenShift Commons Gathering Data Science it's a it's a very very interesting era where we are starting to take a closer look at how data and AI is going to transform a lot of our experiences I'm Ganesh Harina I'm with Verizon Verizon Media I've been doing data and AI for a very long time over a decade closely and an interesting paradigm shift that I started to see we were building platforms which were very heavily AI driven on the cloud and we're starting to see application demand where we have to start to move these capabilities onto the edge so throughout the presentation I'll be citing our experiences in terms of how we look at these applications how we solve these applications using frameworks platforms and so on but most importantly I feel very very blessed to be part of this ecosystem where I am experiencing how the world would be transformed through AI for better experiences performance efficiencies around healthcare and then so on and when you start to take a closer look at it moving forward five ten years robotic arm surgery is going to be very very normal and what that means is a doctor from New York can perform surgery on a patient in Los Angeles to me this is fascinating and interestingly when you take a closer look at what's required for all these things to happen robotics is important virtual reality is very important and artificial intelligence is a foundation for this capability and most importantly we being part of telco 5g would enable to converge these technologies to make this capability a reality in years to come but when we start to ground ourselves and then take a closer look at where we are today what we are trying to do with ML and AI a lot of applications that really require massive data on the cloud applying AI to understand various aspects of the network was one of the area that I was very very focused on but looking forward industrial automation is a space where we are starting to understand big capabilities and solutions to the right on the left autonomous cars I'm fascinated there's a long way to go but the autonomous car today can look at the car in front but what needs to happen is to be able to really connect to 5g capabilities and apply AI to plan the entire route and that's in play as well and these are like the fascinating changes that we all are living through and interestingly the shift has been accelerated but the way how I summarize my experience any application that we would actually touch field see would be powered by AI but it's also equally important that aspects like AI bias should be taken into account when designing these applications now to summarize how the application shift has happened when you take a closer look at any machine learning application I'm sure we all know there is an aspect of model training which is very compute intensive and there is aspect of inferencing and in today's world very easily we deploy both training and inferencing on the cloud and have this ML AI experience directly from the cloud but if there is one shift that we are actually starting to see the demand of near real-time inferencing and now we are talking about inferencing in milliseconds we are talking about inferencing in milliseconds at massive scale we are talking hundreds and thousands of inferencing happen that needs to happen within a very short duration in order to accommodate this we are starting to see a paradigm shape and that is moving the inference capability very intelligently and seamlessly from the cloud to the closest location where the need is so some of the application if the inferencing is of the order of 10 to 25 millisecond that's just an estimate then ideal you deploy these inferencing onto the cdm edge we vmg we have cdm edge in 160 location we are already in the process of enabling these cdm edge with intelligence through a platform called leo which I would cite it in a few minutes and most importantly there are a lot of applications which really need inferencing near real-time at massive scale and most importantly highly reliable in order to accommodate the factor of high reliability and also the aspect of millisecond inferencing we have to start moving inferencing to a 2u box is what I call now an important paradigm shift when we go back and start to understand evolution of internet in the very very beginning it used to take fairly long for pages to download when we accessed yahoo.com from Sydney but magically capabilities like cdm was enabled to cache content geographically in different locations and this technology happened behind the scenes where a sudden change in human experience happened in terms of using the internet everybody started to have consistent experience of internet and cdm is magic so today when we start to take a closer look at how we want to deploy applications enabling the cdm edge to be able to deploy ml applications is very very critical and there's a transformation or change that's actually happening in this area as well now what are the applications that are really being discussed right now and why really we would need inferencing to happen so near real time and what what exactly is a big problem there is another very important paradigm shift that we all I'm sure started to notice up till until now lot of ml applications were actually primarily driven by signals from sensors they're very two-dimensional they're records and they're billions of records in fact the platforms that our team really operate build applications we ingest 100 billion records every day but it's very easy even to operationalize platforms which can ingest and process 100 billion records because you have that luxury to be deployed on the cloud and most importantly the inferencing aspect is on a two-dimensional record and the shift is the video content from where we have to pick up intelligence apply machine learning to surface insights and solve the problem that's another huge paradigm shift and it's no exaggeration when I take a closer look at lot of applications that come our way when we are starting to work on majority of the applications are camera driven in space of factory automation and what we are seeing right now is an example of factory automation where you have video cameras which is observing the assembly line and these feeds would be fed to platform like Leo where you have applications which can understand the video signals inference and alert if there are issues alongside other other sensory signals like temperature current and other things so so factory automation is a space or area where we are continuing to invest a lot in building applications and I call it a 2u box we have to deploy a 2u box we need a platform like Leo we need applications staying closer to the edge that we have that reliability both in terms of high volume inference thing and also ensure that it is seamless and it's actually working in a factory environment and 5G private definitely is going to play a big role to connect all these different sensors cameras and so on and route signals and video streams to a platforms a centralized platform which can ingest and apply artificial intelligence and start to surface insights to improve efficiencies to avoid error near real time without any material loss and this is an area we bryzen are starting to heavily invest I'm sure many of you know bryzen already has a company called skyward which was acquired few years ago they are into helping flight drones now knowing bryzen has tens of thousands of cell towers having technologies like drone and computer vision so on it's it's very timely that we we start to build applications instead of people climbing on the cell tower to understand issues with the towers and connections and so on flight drones to understand the issues around those cell towers one it addresses a lot of safety issues to it addresses a lot of sorry there's a lot of cost efficiencies attributed as well and most importantly with computer vision you really see a lot of insights where you can take corrective actions near real time and we're continuing to invest and this is kind of a very vertical application today you solve it for cell tower cell towers you can retrain it to monitor oil pipelines buildings and bridges and then so on I personally am very very fascinated about the mission that we embarked on we are very very early on though there's a lot of learning here but I'm sure in months to come we'll be able to operationalize products like what we are discussing right now and it really requires edge capability the video streams coming near real time inferencing on the edge and then being able to provide surface sorry being able to surface insights to the person who's a really conducting the survey of the cell tower or an antenna now how can we how can we solve all these things efficiently is the term that I would actually like to use when we take a closer look at the next generation application pretty much every application would have an aspect of machine learning attached to it but the very interesting difference between the application that are powered by machine learning and traditional applications is the machine learning applications are not static I can't say the release is complete this is an awesome application you guys go ahead and use it we really have to start to monitor the model and have a process in place to really retrain the model to make it more meaningful relevant and accurate on the ground and that's a non-trivial problem and that's where we need to have an ecosystem that supports the next building and deployment of next generation application the ML based applications can be transactional I can't say I've deployed the application and I can't walk away I need to provide tools and capabilities which can be used to ensure that these applications are meaningful over a period of time and that's very important on one side on the other hand be able to distribute the workload training workloads on the cloud and the inferencing workloads on the edge in simple terms I call the pink boxes and the blue boxes were deployed on the cloud now eloquently we have to separate these pink boxes to the closest edge which could be a cdn edge or a 2u box which would empower you to build applications like drone vertical inspection applications like factory automation and then so on so we are very heavily invested in operationalizing the capability of platform which helps empowers us to build edge application seamlessly so what you're seeing is a very high level blueprint of the platform Leo where the pink boxes are taken care as part of the model inferencing and application deployment and this application deployment has to be end to end we should be able to run UI it has to be secured and this to me is a paradigm shift we all talk about distributed infrastructure now we're talking about the distributed application where the same drone inspection the same factory automation has to be deployed in multiple locations and in many cases it has to be integrated on the cloud to make it work very very seamlessly and it's a fascinating time where the demand for infrastructure is changing the security posture is changing we just can't say we have an awesome cloud infrastructure in multiple locations it's micro clouds and these micro clouds have to be connected to the parent cloud primarily because their application loads are distributed on the edge and on the cloud with seamless interconnect and what you're seeing is a reflection of our view about a year and a half ago and today what you're seeing is real so Leo is a glue between various technology infrastructures platforms and integration between data sensors and so on which will enable and empower to build different applications like drone inspection factory automation digital twin that has been operationalized for Verizon's own good within Verizon and I'm sure we all have our own strategies but I'm very excited and encouraged to share the success that we're actually starting to see about understanding the needs of the edge platform and ironing out the capabilities that are actually needed on on the edge now in a nutshell when you take a closer look at Leo you can build an end-to-end application on Leo which can ingest data which can apply inferencing at massive scale on the edge and be able to deploy any machine learning model and most importantly this is container based so what that translates to is it can be deployed on any edge platform but as I was mentioning it's very important to have a seamless interconnect to the cloud because it's just only portion of your application and a lot of the training needs to happen on the cloud and there could be compliance policies where you have to purchase data on the cloud and this data has to be shipped onto the cloud for various reasons and most importantly a fascinating approach of building models this is called distributed model training which can be consolidated on the cloud can be approached through platforms like Leo now at a very high level for us when you take a closer look at one of the capabilities that we would need on the edge data management is super important be able to ingest data all forms of and kinds of data high throughput and so on and it should empower us to build end-to-end applications with UI very secure and so on and most importantly the security posture has changed because you have a 2u box sitting somewhere physical security becomes important application security becomes important to you these things have to be factored in this which is beyond Leo but we need to have a strategy to address all aspects of security and Leo does address application security we would have to depend on edge enablement capabilities like OpenShift as well in this case to ensure that it is seamless we can control or manage the container seamlessly on the edge and also provide a very secure environment to deploy edge applications and most importantly have a strategy in place where you have components where you can deploy models seamlessly manage it monitor it and most importantly perform near the time analytics too and everything that I have said is part of Leo it's operationalized and we have been very very successfully been using within Verizon and interestingly though it's very very early Leo has become the north star edge architecture for Verizon Media Group as we speak now to conclude we are starting to see a new influx of application I call this as next generation application and these applications each one of them would be powered by AI there's no doubt they're poised to enhance human experience and efficiencies and health and safety and so on but the paradigm shift from the infrastructure perspective is we have to understand and identify the components that have to be more closer and closer to the edge it could be a CDN edge or a 2U box now I think with that the way how I would like to summarize a lot of the stories and experiences that I have explained it's it's a very very it's going to be very very interesting as we move forward primarily as you start to take a closer look at building ML and AI based applications it's complex we have to find ways to simplify this through a platform strategy we need to have strategy and partnerships in place where we have control on the edge and technologies like OpenShift definitely will put us in a very very good situation to have a very controlled and manageable environment taking into account it's very very distributed too and most importantly how are we going to build test deploy keep the environment very agile that way it's adaptive adaptive too so taking all these things into account we're very early on we have our own experiences very happy to learn your experiences too connecting offline and also I'm starting to look up to consortiums like a neuro system I'm really excited and happy to be part of it and also I feel very blessed to be part of an ecosystem like this while we bring in what we know primarily from experience perspective in terms of solving problems on the edge building ML and AI applications for rise in rise in media and other enterprise customers that we are starting to work with we're here to learn as part of the ecosystem and become more and more efficient as we continue to build our next generation applications which are envision would change a human experience which would improve efficiencies and also most importantly I am excited about the security posture improving security posture and also health and safety too so with that I sincerely thank you all very much for this opportunity and look forward to sync up with you offline as part of consortiums and then we can take it from there most importantly stay safe I'm sure you're all going to have a fantastic and terrific 2021 thank you