 So we're going to have a different kind of talk, a very data-driven talk today. We'll introduce, yeah. So my name's Diane Mueller. For those of you who don't know me, I'm the director of community development for OpenShift, and I work in the cloud platform BU of Red Hat. And I've had the wonderful pleasure of working over the past three years. Three years like Paris OpenStack Summit, I think I met Daniel. He's creative? Yeah, of course. There you go, from Betergia. And Betergia has been the tool of my choice to use to do data mining on the OpenShift community. And over the past couple of months, I don't know what's going on with our slides, but once you throw up there. No worries, it's the PDF I was trying to say. Okay, so go to the next slide if you don't mind. So we're gonna talk today about this theory I have, or, and I didn't try and use the data to prove my theory. I've been playing in this data mine of GitHub contributors to the OpenShift project for the past three years. And the past, I'd say, three months, Daniel and I have been, luckily, working together on a new data set, which includes Kubernetes, a number of the CNCF projects, and OpenStack, so my data set just expanded and my mind was blown. And so I'm gonna talk a little bit today about why I'm looking at all these different disparate data sets and do a bit of a reality check on the OpenShift project itself. Talk about what I call dynamic community personas. So I am addressing the different types of people that are in the communities that we're working at. And then we're gonna talk about how we're changing the model for community management or community development, as I like to call it, and some of the tools that we're using to do that. So first, let's talk about how the projects, there are so many projects at Red Hat. If you work at, how many of you work at Red Hat? Okay, so I know the Red Hat is through. Everybody at Red Hat has seen this diagram. Anyone who's ever gone to any conferences. There are millions of projects in GitHub and our little project sits in the middle here, somewhere OpenShift Origin and it's been around for about five years, if you go to the next slide. And we talk at Red Hat a lot about open source and open communities and the whole day today I've been sitting in the back of the room and it's been a huge pleasure for me to do that because I don't usually get to sit in a conference room and not have to give or organize a conference. So it's been wonderful. And we're talking about open organizations, community models, all kinds of collaboration. And really when we talk about this, these are the things at Red Hat and in other open source projects that I'm involved in that we're talking about that really drive innovation into the project. I'll be hitting it again. But I wanted to step back a second and talk a little bit about OpenShift itself as an open source project to set the stage a little bit for why this data is so interesting and the sort of network analysis that we got to do with this data set. I have a new formula. About five years ago when I joined Red Hat, OpenShift was OpenShift Origin. You may remember the Panda was our logo back then and it was a Ruby on Rails project built on a MongoDB. It was a Paz or Paz depending on where you're from, a platform as a service and it was a standalone, very independent project and about almost three years ago now we pivoted and shifted the whole project to be basically a Kubernetes distribution. And last August we finally upped and renamed the project from Origin to OKD which I jokingly refer to as OKDian because we couldn't make up any other name. We couldn't use Kubernetes in the naming of it or call it a distribution and for legal reasons we had to change the name. But it really became a Kubernetes distribution with a whole lot of value add on top of it and lots of upstream projects being merged into it and lots of tangential projects that touch upon it as well. And so we'll talk a little bit about that. So OKD just to continue that thread is out there at okd.io. You can still download it. If you hit the next one, I'm trying to race through this little bit background. It's still, we didn't change the name of the repo at all. It's still Origin, OpenShift Origin just for naming when we refer to it. We now refer to it as OKD and we brought the panda back in a big way on the site, go ahead. So when we did that shift from being a standalone project that was a platform as a service and shifting over to Kubernetes what happened to us or as a community manager what happened to me was that my world exploded again. So what we really saw was more and more of the engineers that were working directly on OpenShift were shifting where they were doing their work into Kubernetes. And this slide is probably out of date a couple of months or a couple of cycles but a lot of the Red Hat engineers ended up working in Kubernetes in the different SIGs and I think there's even more SIGs now. There's no way you could put all the SIGs on there but really what happened was it was a shift from trying to get people to contribute to our project to OpenShift Origin and put code there into collaborating with a much bigger user base. And that user base just kept growing because not only was it Kubernetes but it was all of these side projects and initiatives that are coming under the CNCF as well as a number of other ones like the OCI work and the different things that CoreOS brought to the table as well so there was a lot more moving parts to community management and community development, a lot more of relationships that we had to start taking account of and tracking. So the reality check is and I just wanna keep this all above board is that OpenShift as a project has always been a heavily Red Hat driven project. When I came on board about five years ago, it was very Red Hat and that circle is, it's not out of date, that's like the past five years of contributions to the project. Almost 98% of the work that is code in OpenShift Origin has been done by a Red Hat engineer, someone paid by Red Hat at some point in time in their career. When I first came on board, there were really only five other companies that were contributing directly to OpenShift Origin. Over the past two years and three years, we've seen the rise of OpenShift and the popularity of Kubernetes and we've also seen a huge rise once we take those open Red Hatters out of the number of contributing folks in the project. But it's still not great, so I'm not gonna say we're doing rah rah, we're wonderful, we have huge contribution because what we're gonna try and show you today is where the contributions are going and why we want them to go elsewhere. So sort of the network effect and how all of these communities are starting to converge on each other. So I'm gonna just pause it. The other thing that happened over this past two years is we started up a new community model called OpenShift Commons, which is bringing in users, contributors, upstream project leads into a peer-to-peer network and we'll talk a little bit more about that and why that's helping us grow from five to 70. But first, a word from our toolmaker. A bit more of background here. So it happens that Diane has all of the knowledge about the community. I have kind of a small amount of knowledge about the community and any other community we are analyzing. So then we have two things here. We have skills from the community and the knowledge from the community, domain context, let's say. Then we have more technical skills, let's say, in terms of producing data and analyzing things. So the first time we started to discuss about this, Diane said, I want everything. And she said, well, let's start small. So we produce some value, right? And then we say, what do we have here? So the first analysis we started to do was the OpenShift Kubernetes thing. And the question here was mostly related to, hey, we have OpenShift developers. We have Kubernetes developers. Do we have people working in both projects? Yes, there are. You know, probably all of this. But then how many of them are coming from Red Hat? How many of them are out of Red Hat? What other companies are working in this way, right? Then we can extend this analysis to other projects. So we can have more CNCF-related projects, those incubating as these three. These projects are being selected basically because those were- Those are the ones I had the most domain knowledge in. Exactly. So Yeager and OpenTracing, I helped shepherd them through some of the work that they did to get into the CNCF. So I knew all the players there. So it made cleaning up the data set easy. Yeah, and then it happened that we had the whole OpenStack foundation analyzed. So we said that we can put everything together and check what's going on, right? So the analysis for today is only focused on git repositories. So for OpenStack, we have something like 1 million commits, 600,000 if we ignore the merge and so on. Kubernetes is the second one, the biggest, then OpenSafe and the other three in terms of the size of the activities. And I think the most important thing that I would point out about the data set too is the biggest lie about machine learning and AI in my book is about data cleansing. And the most work that went into this project was cleaning up the data set. And nobody ever talks about that when you're doing data analytics, but cleaning up the data set. And we'll talk a little bit more about that. So I had been playing in OpenStack community from, I don't know, early days in Boston. I knew OpenShift and I knew Yeager. And I know a lot of the people in the Kubernetes world, container D, not so much, but it really helps if you have some domain knowledge. Going in cold would be difficult, I would say. Yeah, and this is the tooling we are using. It's Grimoire Lab, which is a Linux Foundation project, in this case, it's part of the chaos community. It works like, well, you have the data sources, then you have some extraction part, you do some process, and then you go to the browser, which is the final dashboard that we have. The magic about identity is right here, and this is where most of the work is being done in terms of affiliations. We have to, well, you have spent hours and hours. A lot of time. Yeah, and we have spent hours and hours doing that and cleaning the data and so on. So I would say from a data perspective, probably 90% of the work is cleaning data, curating the data, and then preparing the data for visualization, thinking about what you want to visualize, let's say, right? And then the other 10% is basically the results you have on the table and how to explain this. So you need some storyline, right? Just a bit of introduction about the charts we are gonna see in the presentation. So this is a social network chart. Think of the dots, the pink dots are developers. If the dots have a different color, perhaps it's because that developer is coming from a certain company, so Red Hat will have a color, Google will have another color, et cetera. The size basically is the size of the contributions of that developer to a specific repository or project. So the blue squares are basically repositories or projects, right? So we have a connection between a repository and a developer if they have committed something in a certain period of time. That's all, okay? Questions? No? Okay, so let's blow through those words. So the way that we're gonna approach looking at this data, I thought would be to do it based on personas, because there are a lot of people out there and a lot of different roles in the community. So the first one is a persona that most of you will recognize right off the bat. And this is where I was joking with Dan, who's in here, that I should have used him since I knew he's in here and he's had his fingers and lots of flies. But Clayton is really, he's a project lead persona. He's someone who's really driving the OpenShift projects from the early days. And he works on, surprisingly, a lot of things for the amount of time that he gets to do. But how he shows up in the data sets is contributing to Kubernetes, of course, to OKD. He has one contribute to Prometheus. He's been playing in Knative. He's obviously done some stuff in Project Atomic. And if we go way back, and we'll show you in the dashboard now, so we give a little bit of an illustration. He also shows up in Solemn, which is a project, I think, now dead, but maybe not quite yet dead. I saw there were still some people contributing to it. So somebody out there is still using it. So why don't you show them what it looks like in here? And everybody stay off the Wi-Fi right now. So I have to say, if someone calls me, then their connection will drop. We're hooptive, yeah. I'm using the 4G connection. So this is the first time Diane said, oh, what if we analyze Clayton Coleman? And I said, that's all right. I don't know him. So let's look for him and his activity, right? So this is the last eight years of activity of Clayton. You can see on the top, right, the time picker, which is around here, last eight years, right? Then pretty similar to the previous chart, is we have the developer, we have all of the repositories around. So if we dig down here, we can go for, well, Kubernetes metrics, or some Kubernetes thing, but if you go to some other areas, you will see some open sieve, Docker stuff, background, blah, blah, blah, blah, right? So there is a bunch of different pieces of information that you can see around for this developer. Then we said, what if we analyze the very beginning of Clayton, right? When he started working on these projects. So go back in time a little bit. Yeah. So this is Clayton in 2011. He didn't exist. That was all, right? So let's go a year after. It's running, okay. So what we have is that Clayton started in, working in origin server, Django examples, WordPress, and open sieve there, RHC, and mainly in these two projects, which are, in this case, origin server, this is the number of commits, right? And this is the project, in this case, it's open sieve. And then this is a split by the repositories they've been working at. So we have origin server, and the other one here, RHC. Well, let's advance time a little bit. Yeah. Because we have a lot of data to go through. This is 2012. So then here is the open stack contribution. This is Solom. And I'm not quite sure exactly what he was contributing there, but I had to go back in my memory because I remembered Solom and Adrian Otto had started that project up, and it was something sort of like a platform as a server, open stacks answer to platform as a service. If anyone doesn't remember that, that's okay. It's a good thing. So let's step forward in time, and you'll just see how he develops. So basically, this is 2013. He keeps working 2014. He keeps growing in contributions, mainly in this case, open sieve, origin, and Kubernetes. Then 2015, more and more repositories around, mainly again, origin and Kubernetes, but then there are some others as the API or federation. So, yeah. So he's a busy boy, as we can see. Let's pop into the next persona. Yeah. Because we all know that Red Hat's doing a lot of contribution, but I wanted to show how we tease out a, and someone else, I think you mentioned CERN. CERN does a lot of open source work. They use a little bit of open shift in-house there, and we use a lot of Kubernetes, OpenStack, a lot of other Red Hat stuff, and a lot of other. So they show up in OKD, in our data set, Kubernetes, OpenStack, and I should say more contributions, obviously. And I love this picture because it's almost the same diagram, but it's their upcoming collider as all the intersections of things going. But if you drill down on the CERN stuff, what's interesting about this view here is what we did, you start to see the interrelatedness of all these projects. So these are developers who are working in both projects, and we can watch over time how they're contributing to open shift, what they're putting into it, and if you zoom back out a little bit, you can see when some developers are working on both projects. And this is when I start to talk about the concept of convergence of communities. So if I just focused on here, on this little bit, as a community manager, I'd miss out on what parts were important in Kubernetes to CERN. So having these tools really is very helpful for me as a community manager to see where the puck is going. I'm Canadian, so where we're playing hockey. Where the puck is going, what's important to the organizations that are working on my project. Because my project no longer really is a standalone project. And I think some of the work that they did was probably really early on. So if you, can you take off the CERN filter on this and show all of the, if you have one shot, I think the next one, you had the one that was all. Oh yeah, the big one, not here in the laptop because it's really heavy. So just to let you know, so this is open shift, this is Kubernetes, that's open stack. Okay, so. So we're gonna talk a little bit too about, because the other persona that we look at a lot, or I look at a lot, is some of the individuals that pop out in the big jellyfish. We should show the jellyfish thing too. Jellyfish. Jellyfish here. So you know what I'm talking about. So doesn't this look a little bit like a jellyfish here? There are the important things. You start to see all of the people that are involved in multiple projects. And then there's a few floaters up at the top are the connectors who are between different projects. And I picked Jaeger because I knew the people there and I figured I would recognize some of their names. And what we're trying to do is work through all of the GitHub data for all of the different projects that are under the CNCF umbrella. But we haven't gotten quite there. But if you go back to the picture of Greg, let me explain why this one's important because not because I'm wearing my silly grin on my face, but because he shows up as a contributor, he only has one or two contributions to all of these projects, but he shows up as what I call a connector person. And as if you've read any of Malcolm Gladwell's books around the tipping point, and that he is what I would call a maven. He is always on our Slack channel. He's done a number of commons, briefings. He's very social. He's one of those guys that likes to answer questions and stuff, so for me, he's a standout. And you can see that by his connectedness in the community and we gave him an award for that. But I didn't even know until I started looking at this that he was even looking at Jaeger. So these are the kinds of things with these tools that start to pop out in the diagrams. You see the people who are your mavens and the people who, and they tend to be people. Usually they're people you know. And the point here was that Jan asked me, so what is Greg? And I said, yeah, I don't know. So this is all Kubernetes open safe. These are people working in these two projects, right? And these are people working in the three projects. Those are people working in these two projects. So we realized that Greg is there. So he was here, so he's here, right? So these are the three developers that we have that were working in Jaeger, open safe. So he's there, we found Greg. And from a community development point of view, there's other views we can see. If you go to the next slide, the next person. We start to see other things pop out in these network diagrams. And one of them is Yuri Shurko, who is with Uber, who is the guy who is the lead on the Jaeger project. Did a lot of work with Zipkin in Open Tracing. And he shows up with the Open Tracing and that he doesn't show up as contributing to OpenShift in any way, but Jaeger is a project that is very near and dear to the hearts of every OpenShift deployment. And OpenTracing is something that's really we need. So it's very important to us. So from a community manager point of view, I need to keep being aware of these things and watching where the puck, as I keep saying, puck is going. The other thing about that picture, first before you go away from it, is how many of you were at KubeCon? A couple of you were at KubeCon. If anyone saw the keynote that the two folks from Uber did, they are also working on something, an operator for their M3DB, their open source distributed database, using the operator framework, which is a project that got brought into OpenShift with the acquisition of CoreOS. So there's all of this interconnectedness and all of the operators will hopefully all run nicely with OpenShift 4.0. So you start to see with these data diagrams things that you wouldn't normally think of as community manager domain things, where I, historically a community manager would just focus on getting people to contribute to OKD. I don't care so much anymore as OKD is really a distribution. We discovered part of this because we were following, we kept working on this. So this is in this case OpenShift here on the top and then we have Kubernetes here, right? And then it happens that we are filtering by Uber, Racket Space or Red Hat. So the Uber folks are around here with the green thing. Red Hat is this light blue stuff and then the purple are the Racket Space developers. So we said, what if we only focused on Uber? So then we can do things like this. Let me remove these two people here. Then we save. So then this is Uber and then we have here our friend Yuri. And then we see how he's been quite important because the size is basically telling us, I've committed a lot of things. And in this case to open tracing, Jagger was the other project, right? Yeah. And if you go back to the slides again, go back to the jellyfish because I'm gonna keep referring to that as the jellyfish diagram. This one? One of the reasons this is really important too is, and one of the next phases of research that I'm gonna be doing is pulling in the data on all the other, and there are lots of them, Kubernetes, things that call themselves Kubernetes distributions. Because it really would be interesting for me and from a CNCF, probably point of view too, is to see where those other Kubernetes distributions are giving back to Kubernetes, you know? And where that fits into this model here too as well. So we keep going here. Jewry, cool? So, I'm just watching. Yeah, we have something like, We might go a little over because I got too much data. I talk too much. Amadeus, if you know OpenShift and you've gone to any OpenShift thing or read anything or Google OpenShift, Amadeus has been a huge contributor. They've done talks for us about Kafka at one of the gatherings. They do lots of security practices. They've been really active in doing that. But in doing this analysis, we started to see another interesting phenomenon too is that their contributions to OpenStack were into RDO because they're a commercial offering. And some of their OpenShift OCP contributions were actually to the templates in Azure. So they showed up in Microsoft's Azure's data sets for building the OCP templates for running on there. So they show up in different places. But as some of these, we're using these examples because I know they're true. But now I can tease this out about other companies that are working and contributing. So that's a list of 70 people who are contributing to OpenShift. The beginning becomes really important and an interesting phenomenon to have. Can we go to the next persona? Well, this is the same example of having Amadeus data. So we have Red Hat in pink and Amadeus in green. So they have certain collaborations around. And the next one. And then I was gonna tell a real quick story about a problem child that I found. When we were going through the data, I talked about data cleansing and data cleanup. This gentleman, to first, his GitHub was yue9944822. I didn't know who it was. He showed up. And one of the things about this data set that's really cool is that it's the log files from GitHub. It's not just the GitHub name and contribution. It goes really deep. So I can go right, drill right down from that graph into someone's actual pull request or commit and get a link back to you and actually see it. And so when this first popped up, this gentleman was listed as 10 cent. And I had this moment of like, oh cool, is 10 cent using OpenShift? Are they under the hood somewhere, this big Chinese conglomerate? And I'm like cleaning, cleansing, and going through the data logs and going. And it turned out, well, one of the Kubernetes leagues, David Eads knew him from, oh no, it was Stefan Chibanski, who knew this person, introduced me to him. And I got to have this conversation on Slack with this person who's in, I think he's in mainland China. And his contribution actually turned out to be, he's not at 10 cent, it was a mistagged data person. He's at Alipay. And he didn't actually make a direct contribution into OpenShift or OKD. It was an upstream one that was mislabeled. So, but the interesting thing for me is, and so that's data cleansing, but that whole conversation that I got to have with this guy because of these tools, because I can now see someone was making a contribution from an outlier, it popped it up so that I could then have an interaction. And out of that conversation, he said, oh yeah, and we were thinking about using Origin, we just haven't had the time to use it yet. And because I got to have that interaction with him, hopefully that'll spur him on to using some, somewhere inside of Alipay, which is even bigger than 10 cent. So, that was really cool. But he shows up as well in, he does work in Saltstack and Spinnaker and a bunch of other things too. So, it wasn't a bug, but a feature. It wasn't a bug, it was a feature. It was part of doing this. So, I talked a little bit and I'll be really quick about, and I can talk at length about what OpenShift Commons is. If you're coming to London next week, Dan and myself and a bunch of other people are hosting a peer-to-peer face-to-face conference. We do a bunch of them periodically, the day before Kubecons, the day before Red Hat Summit, and a bunch of regional ones. We're gonna do one in Milan, so if you wanna go to Milan, let me know. And we'll make you talk. But really what we've had to do in order to expand our connectivity to these folks is change our model of how we connect with people. So, we do a ton of face-to-face events. Think of gatherings as bigger than meetups all day long with everyone in one room meeting and talking to each other to get that face-time. We do OpenShift Commons briefings, so we give away the podium all the time to people to talk. I'd love to have you talk about that. We are really active on Slack. We have our own form of SIGs, which are more about best practices and lessons learned as opposed to the Kubernetes technical SIGs, so we get a lot of talking going on there. So, we're really trying to change the model to adapt to this ever-expanding group of people that are part of the network. So, there's a couple of things that we were gonna, because we're really at the end of our time here. The one thing, the takeaways really are that for us was that no company is working on just one thing. I really can't find anybody who doesn't have their finger in multiple pies and working on different things. Coordination with the upstream projects is essential as anyone who's working in the Kubernetes world every release, every three months, plus all these ancillary, tangential projects has become a phenomenally difficult and a multi-person task. The relationships matter. That's why these peer-to-peer events and Slack channels and giving away the podium to other people to speak. In working on this project, it became very clear that domain knowledge is essential, like to understand which of these projects were integral to OpenShift and which ones weren't. We're gonna keep adding more in that, but it was very helpful. We're going from an era of what I call community management to community development. So what we're trying to do is continue to build these relationships with people and continue to extend the network of our relationships with from a Red Hat perspective, from an open source leadership perspective with all of these other projects. And it's not a one-person job anymore. There's many, many people in it. Inclusivity over exclusivity. That was sort of my mantra around when we had initially had the project, it was mostly Red Hat driven. And what we're trying now to do is extend it and embrace everybody else who's in all these other communities, including the OpenStack and the other ancillary communities as well. Data, I think I can't say this enough, the data matters. Cleaning up, if you do it as an ongoing job, which I kind of like once a month, I go through all the outliers and figure out where it is. And my one thing that everyone probably would be scared about is this last statement is that anonymity is dead. If you think you're anonymous on GitHub, forget about it. Everybody is identifying. Every one of you has a mobile phone. We can find you somewhere. You're in LinkedIn, there's something about the project you're working on that identifies you as which organization you're with. That said, we went all the way back to the dashboard, there's still a tremendous number of people who are unknown or rank themselves as independent that are working on OpenShift and Kubernetes as well. So that's, I think a healthy balance needs to be maintained between people who are being paid in their day jobs to work on these projects and people who are independent. Though I think as we identify more of them using the sorting hat and brute force, a lot of them are coming from places like this, educational facilities, people who are doing work and research projects. And so what I would say is what's next and I'm not joking about this is predictive analysis because what this one effort doing these things really showed me was if we start looking at these network analyses, one it gives us competitive insight into what the other Kubernetes distributions are doing but it also shows us where the puck is going. If we start looking at these data sets as, so these are our core mavens, these are our leads, these are our customers, what other projects are they looking at? So as we get all of GitHub in there, this becomes really amazing work. I hope I'm expressing how excited I am about being able to have these tools to use to kind of get ahead of the game a little bit for once as opposed to waiting for the next big thing to be told to me by some very excited, passionate developer. Instead what I can see in advance now is where are these top 20 folks from Amadeus, what are they working on? And it turned out like six months ago it was Kafka. They were doing a huge bunch of work in Kafka and then we got them on stage to talk about it at an event, a gathering event. So really I think the stuff that we're doing with Vitergia is awesome and the tools, if you're a community manager you ought to be using them because it's almost impossible to do this work anymore by gut or by just personal touch. You really need to get into the data and clean it up and know who your relationships are with and what projects your project is touching on. I'm gonna add two cents here. Yeah, the only thing I think this is for our customer, this is a really important question which is what's the next hot project? Because I have a community right now of developers or there are two points of view. First, there are some people that don't know where the developers are working in the open source ecosystem. So they don't really know where they are spending the effort or why they are being paid. And the second case is the one you mentioned which is what's the next hot project? So after Kubernetes, after what's the next? So if we go for the core developers or the early adopters, we may know in the GitHub ecosystem where they have started to work because they are working some repository or because they are leaving traces of activity in the issues or put requests. And that's something doable. It's a matter of time and machine. And who knows, maybe in three or four months time we'll get access to IBM, Watson and be able to drive all this money through all these data through there and Watson will tell us. So I know we went five minutes over and I ignored anybody flashing cards but if you have questions or if you wanna talk about any of this, we're around tomorrow and this afternoon and if there are any questions today about this data set. Is this, sorry. Go ahead. Is this project that you two are doing, is it online, can other people contribute and also look at the IRs? I would happily work with you and... So just repeat the question, please. Yes, so the question is, is this network data? The network data dashboard is not yet online and open. It's just been something that Daniel and I are doing but we're gonna be making it accessible and since you have lots of domain knowledge about OpenStack, I'd love to get your two cents in it. Any other questions? Who wants access to it? Yeah, all right, well thank you very much everybody for staying and sticking out with us.