 Well, everybody, welcome to yet another OpenShift Commons briefing. And this is going to be a fun one for me, because I have a colleague of mine that I've been doing some research with, Daniel Isguardo from Bitersia. And we presented what was a week ago Saturday at the ICGSE, a very, very abbreviated version of this. So we thought we would do a deep dive today, because there were a lot of questions about how to use the analytic tools, why we're doing it. And so we're going to take this opportunity when most of you are probably off on vacation to steal an hour from you and talk about how in the OpenShift Commons and the OpenShift ecosystem and the Kubernetes and the CNCF ecosystem, we've taken a data-driven approach to doing community development and how that has helped me really be able to be effective and nurture a healthy, diverse, hopefully very engaged community around OpenShift, OKD, Kubernetes, all the CNCF projects were incubating, operators, all those kinds of good things. So I'm going to motor through this so that we can get to the deep dive a little bit. But I'm going to set the stage first, and then we're going to have Daniel do a bit of a demo of how this all works. So this is the paper that Daniel and I wrote together. This diagram is out of date, though, because it is based on data from GitHub and other sources, which we'll talk about. But it's what I generally refer to as my jellyfish diagram. And it basically maps out all of the network relationships, the network, the relationships and the networks between them, between the projects, the people who are contributing to the project, participating in them across the CNCF, Kubernetes, and OpenShift ecosystems. And we'll dive into that a little bit more. And if you know Red Hat, you know that you've probably seen the screen before and that we're really all about open source and believing very deeply. And it's in our DNA that open source is the source of all the technology innovation that's happening today in the world. And GitHub is where we live and breathe. These numbers, again, are a little out of date. And they've grown exponentially. I think it's 125 million repositories at this point. It's huge. And there are just a few of them on the screen. And OKD, which was formerly known as OpenShift Origin, is the one that we're going to focus on a little bit today. And so if you don't know OKD, it is the OpenShift distribution of Kubernetes. Basically, we like to say it's a function of Kubernetes plus plus all the other things we add into it at Red Hat. OpenShift is easy to find. It's going to be GA hopefully next week with the 4.5 release of OpenShift. We'll have a distro of OKD4 for you. And you can try it out at OKD.io. But it basically is a community distribution of Kubernetes. And one of the things that happened over the course of the time, maybe four years ago, we switched from a standalone open source project that was origin to being rebasing and re-architecting OpenShift on top of Kubernetes and heavily using containers. So if you're an oldster from OpenShift like me, you still remember gears and cartridges. But then when we switched over, we really had to refocus how we looked at what community was. And the reality check and the honest thing was that from the most part, the contributions to origin were Red Hat based. It was Red Hat dominated. Once we took Red Hat out, there were a lot of external folks contributing to the project and still today on the value added parts of OpenShift, they are primarily things that are integrated and added to it, the value adds by OpenShift. So I'm not going to change my tune on that. That is really where most of it is. But the big change has been for us. And the complexity comes in. We have this ecosystem based model that we switched to. If you go to Commons, you'll see there are right now over 585 member organizations that are part of that. These are end users, integrators, cloud providers, upstream project leads, tons of people having conversations that we have to interact with and understand where they're coming from. And then we've seen what I call the rise of the interrelated cloud-native ecosystems. And everybody shows this picture. It's crazy, I know. But it actually is very helpful when you filter it down to some of the open source projects that are being incubated. And that's really what I tend to focus on is these ones that are either incubated or graduated. I do, trust me, I look at all the sandbox ones too. But for this analysis, we're just going with graduated and incubated projects. And then we're adding in the wonderful world of operator framework. So the vote just took place. And it's just been accepted as an incubated project. I think it's going to officially be announced probably next week. I think on the 9th is when the press release went out. So we have to add in for that all the operators that were building things that are an operator hub and the operator framework itself. So this landscape just keeps growing. And it's impossible to really understand all of the relationships or to know all the people in your community. What I like to say is, in the past, community managers usually focused on one single project and trying to get people to work on just that one. And we don't have that luxury anymore. There are so many interdependencies on the different projects that are working on, pieces that are layered on top of OpenShift, are integrated into OpenShift, or run underneath OpenShift. And all of those release cycles, product roadmaps, feature requests, issues, everything you can possibly imagine, bugs, you name it, all have an impact on each other. And then the human side of it as well is really, I think, the thing that from community development point of view is it's unknowable without using a data-driven approach. So I can create all the spreadsheets I want from mailing lists and analyze them up the wazoo by myself by hand. But about, I think it was, did we decide when we first met Daniel? When you first showed me the Biturgya grimoire lab, was that 2016? I forget. It was 2014 during the OpenStack summit. So I've been looking at this magic for a long time and have been implying it, trying first the dashboard, which gives you the pie chart and breakdown of the contributors, and then this network analysis stuff. And so we've ingrained this into the way that my day-to-day approach to working with the many communities. So I've been able to scale myself in some ways in a way that I couldn't formally do without having a data-driven approach. And these data-driven approaches, the sales teams use them. CRMs are them. Those are customer relationship management things. This should be a community relationship management tools is the way that we look at this. And so basically what we're just doing is applying some data science and analytics to the problem space of understanding who's in your community, how to nurture them, how to support them, and how to reach out and connect with them and connect them to each other. I'm going to stop. I'm going to let Daniel talk a little bit about, now that you know how complex the problem is, about the tools that we have and the data sets and where we're working with. So Daniel, go right ahead there. Yeah, sure. So the analysis is based on key repositories. So if we think about the user data sources in any open source community, we have a bunch of data sources. By data source, I mean pieces of infrastructure that we may be using. So you have already mentioned that some of them, as the main list, we have Slack channels. We have Git repositories. Some of them are using GitHub. Some of them are using GitLab, Atlasian Stack. So there are several of them. And typically those data sources are from five to 10. We think about development activities, communication channels, outreach to the general public. So in this case, and for today, we are just focusing the analysis on Git, which is kind of a big chunk of Git. And then we are focusing on CNCF, OpenCF, and operators. Can go to the next slide, please. So for the tooling, this is how we are moving from art to science, what we call how to apply this data-driven approach to community development. So we are using Grimoire Lab. Grimoire Lab is part of the chaos project. And this is under DEMBRAELA of the Linux Foundation. So chaos is the acronym for Community Health Analytics for Open Source Software. And we are working there in two main areas. One of them is defining metrics from a technological agnostic point of view. So just discussing about metrics and bringing some specific definitions of those. Looking for use cases. And there are several working groups there. So we are talking about diversity and inclusion working group. We are talking about risk working group or a value working group from an open source perspective. And then the second bunch of people are focused on software. There are several tools there. One of them is Grimoire Lab, which is what we are presenting today. And I'm one of the participants or original developers here. Then we have POGOR, which is another tool doing a pretty focused on GitHub, as far as I remember. And then there are a couple of extra tools around. So Grimoire Lab, this is the architecture that you can see. This is not only about retrieving information. So there is preprocessing and post-processing of existing data. There is specific problems that we have to deal with as identity, affiliation, management, how to automate all of this, how to have this in production, and then at the very end, how to produce value to the end user. So starting from the left side of the chart, we have a bunch of data sources. Some of them we have mentioned then. So the other repository is Docker, Jira. We have Maxilla and some others. Then right after this, we have Perceval, which is the tool to retrieve all of this. And this is producing some data transformation. So this is your front end to transform any kind of block or API into a JSON document. And this is temporarily stored in some database. But then at the very end, this is creating a new index in Elasticsearch. Elasticsearch is the database we are using here, the persistent database. And then we are creating raw indexes At the same time, the tool that you can see right in the middle, Lumar ELK, is the data processor. So this is kind of saying, OK, I have a new JSON document, so I'm storing this in Elasticsearch. And then at the same time, I'm asking Shorting Hat, hey, we have a new identity here. What do I do with this? So Shorting Hat is the tool that will take care of all of the identities and affiliations. Shorting Hat uses another database. Why do we have this in this case? This is to be a GDPR compliant. So we have kind of an external or third party database where we can store everything and then everyone can opt in or opt out from the rest of the visualizations and so on. So you can anonymize the information, let's say. So once we have Shorting Hat doing its job, we have the raw indexes. Then the next step is to enrich those indexes. By enrich, it means basically creating specific indexes focused on your business model. And the business model we are talking today is about community development. So we are processing those data sets that are in the raw indexes into something more meaningful for the final user. An example here, if we think about the activity, we have a bunch of commits, right? So in a commit, we see the author of the commit. We have the commit there. We have the date. We have the time zone. We know the files that were modified or moved or copied or created from scratch. Then we have the lines for each of them that were added, removed or modified as well. So all of this information can be parsed and can be transformed. So for instance, by default, PrimorLab is producing, as far as I remember, three or four indexes based on key information. One of them is working at the granularity of commits. So we can go there and check who is working with who in what commits or file paths, et cetera. The next granularity, a more finer granularity that we have here, we can go at the level of the file path. We know where specifically certain organizations or people have been participating at. So if we have some critical area in our open source project and those developers leave the community because there's turnover, right? Turnover happens. So then we can look for the right expertise to try to fill that knowledge gap. But we need that data in advance trying to understand what's going on. Then there is another index that we can create for instance, we are creating, which is the analysis of what we call the onion analysis. So if we think of open source communities as an onion, it's a bunch of layers, right? At the very center, we have the core set of developers. So by definition, we name them as those producing 80% of the activity of the commits in this case. Then we have regular developers, those producing the next 15%. And then we have this long tail open source developers that we see in any open source project that are producing one commit, two, three, four, five. So those are the casual developers and those are typically filling up to the 100%. So producing this last 5%. So from just, let's say one data source, which is Git, we can start producing specific indexes, right? So this is what we mean by enriching indexes. And then at the very end, at the bottom right part of the chart, you see Kibiter, which is a downstream version of Kibana with let's say certain extra vitamins and plugins and so on. And then we have the open source, by the way. And then just the end user that can visualize all of this information and navigate through the data and check and create new visualizations, et cetera, et cetera. So this is the tooling we are using and this is going to model up. So a couple of points about this. I think you went a little fast over the sorting hat and the identity merger. And I just want to harp a little bit on this. If you notice all of those different data sources and you think about who, if you're listening to this later, how many different email addresses you use in all of these different data sources and the idea that we would know who you are or we as a community manager would know, oh, this is my Stack Overflow persona. Oh, this is my Twitter persona. Oh, this is my GitHub persona. That makes, when you try and untie the knot that is community relationships, those are some of the things that having this facility to do the identity merger across all of these different data sources is really huge. It also leads to the other conversation that we have about anonymity and ensuring that we respect people's privacy and if they want to be anonymous, somewhere they are. So a lot of what you'll see here today is we're really focusing on public identity stuff. That is in GitHub and that's what we're doing. So if people think that they're still anonymous in the world, we need to really let them know that this is a very simple open source tool and engine that people can, it's really, you're no longer anonymous, I guess is the point that I'm trying to get to here. And so that brings in another level of conversation about moving from art to science as well as are we GDPR compliant? Are we following the legal stuff? So a lot of that conversation, I think that's another whole days worth of conversation too, but just to let people know, we are working within the legal framework of how we are allowed to use this data. And so just to set that stage. And really, I spend most of my time in that browser box at the end and a little bit of my time doing some corrections in the identity merger space. So as a community person working with this data set, you really need to have some domain experience with it. So if you look at someone's GitHub repo, it may have contributions to Kubernetes, Prometheus, and then there's some gaming platform over in left field. You need to know enough about the ecosystem to know that that gaming platform isn't really, or hopefully isn't really something that has a repercussion for your ecosystem. So having domain expertise about whatever you're analyzing is really important. So shall I move to the next slide and let you, there you go. Explain. So perhaps just another addition. So there are, so we are not the only ones doing this, right? And there are already open source communities providing such information about identities and affiliation, specific purpose of attribution, which is what we are doing here, right? To help advancing in the development of the community and having everyone on board earlier or faster with the proper tools. So communities as OpenStack or CNCF, they really have certain public data sets with specific identities and affiliations for all of the developers. And this is even community-created. So that means that you as a member of the community can go there and say, I am this person and I've been working this company, A, B and C for during these years. So then your contributions will be correctly explained. And this is at the end important for organizations so they can see specifically what's going on. And then we can have some other discussions about what does it mean for instance, influence in an open source community. So we can talk about specific roles as maintainers or proper developers. So who's playing that role from what company that person is specifically coming from. And if we go for a more accuracy perspective, then we can go and have specific questions are, what are my competitors doing in the technologies that are specifically key for my technological stack? So then you need to have certain knowledge and all of these data-driven approaches quite useful to understand what's going on there because you can have specific questions, specific answers to those questions beyond your perception, right? Yeah, I think I started out using the network analysis stuff to understand who was in my community. And I always say that when I'm talking to people who do community development, that the most important first step is knowing who's in your community and how to connect with them and how they're connected to each other. So you can do all the content development, write all the documentation you want, but if you really don't even know who your audience is or who the participants are in your community, you're gonna end up rewriting that or reframing it in some way. So, but there's also the, and we talk about it quite often, the idea that this is one way to see where the community is going. So in some of the earlier analysis, we've done, you can see as things like Yeager took off and Open Tracing and Zipkin and some of the things you could see people moving from one project to the next. And that historical analysis, and hopefully predictive analysis is the next layer that we might wanna layer into this too, to see where the, as a Canadian is want to do, where the hockey puck is going is really what you wanna be watching for too. So the baseline stuff that you need to do in my humble opinion is really know who's in your community and how they're all connected to each other. And then starting, then once you have that grasp of your community, moving and applying that to paying attention to new projects, serverless or illegal or a bazillion other projects as they pop up because then you can start watching the key folks here and what they're contributing to. And it's really amazing what you can learn from this and you can get lost. It's sort of like social media. You can go down a wormhole too, but you always come back up and see how things are interrelated. It's very, it's been hugely helpful for developing the OpenShift Commons and making sure everybody is properly connected and supported. No. From indeed from that perspective, I think it's worth mentioning that before entering into metrics discovery process is really useful to have certain strategy and the table and certain methodology. So people tend to have metrics for the pleasure of having metrics. And the problem sometimes is that you may lose track of where you were going. While if you have a proper, you know, method and strategy and the action plan, then you can play with the data, but then you can come, you know that you have a path, right? So then this is the right way to proceed. The other thing, and we'll get to the demo in a second here, but the other thing that's really important for people to understand to you is like, pretty much every large project has a dashboard. That shows you the static stuff and who's the biggest contributor to this project and who's doing the most in this project. And it's, you know, it's a bragging right for corporate contributors or individual ones. And it's a great way to know how to reward people, but it's really almost useless for doing community engagement, those static high charts and things. You really need to understand the relationships, not the numbers. And I think that's what this demo hopefully will show you a little bit of. So I'm gonna stop sharing my screen and let you share yours. And then we'll see how we're doing here for time and we're doing okay. Because Daniel and I could talk about this for days. And yeah, maybe it's worth to introduce a more of the concept of personas and how we were playing with this, do you think? Well, first show the, what you had there for the open to the second tab there, because that's the one I think is that the basis of the jellyfish and for me like the jellyfish diagram we use in the article, and this is really, the thing that you can't see in screenshots and stuff, but you can dive into here are the connectors here. So the large jellyfish there is Kubernetes and the smaller one is OpenShift. And so we can look at the relationships between who's contributing to OpenShift and who's contributing to Kubernetes. So if you dive, keep diving and it's, as the complexity gets bigger, you can start to and Hans is in there, Luca is in there, Seth is in there. Like I, because I've been working in the OpenShift community, I know almost everybody here, but if a new person pops in, then I'm, I become aware of it. And you can also get list views of this and all kinds of cool stuff, but it also starts to show you, if you zoom back out, I think you've added in Yeager here. Oh, Yeager, the bottom. You can see who's working on OpenShift, who's working in Kubernetes and who's also working in Yeager. So this became important for me when Yeager, when the Yeager team from Uber and Red Hat said, okay, we'd like some help from you, Diane, to get us into incubating status over on CNCF. And I did not know everybody in the community. So I was able to pull in this data, look at who from Red Hat was contributing, who from Uber and other places. And these were my key people to connect with to help move that project through to the next level. And the team did an awesome, and you can see Yuri's there and a bunch of other folks. And so they may not have been contributing to my project, OKD, Origin OpenShift, but they were contributing to a key thing in the ecosystem, Yeager and OpenTracing, that was integral to people successfully using OpenShift and us deploying it in over 2000 enterprises. So this was a great way to use the network analysis in this space. And maybe you wanted to add a few more words in there about how this actually works. Yeah, so just to mention, to explain a bit more how this works, so that we didn't do it in the previous, in the previous slide, although it was already explained, that each of the dots that we see are developers. So those are displayed if they have committed something during the last year, as we can see here, to either Kubernetes in this case, OpenShift and Yeager. Yeager, I think, is already a graduated project, right? Because we have this promptly assigned to incubating, but in any case, we specified this filter here, so we are sure that we were analyzing only Kubernetes, OpenShift and Yeager, so that's why we know this is Yeager and this is not any other project in the incubating or graduated ecosystem. So the bigger you are, that means that you have committed more commits to that specific project. So then we have some dots around that are bigger than the others, so those are developers that have contributed some more commits than the average. We can see some of them here. And then we see this number of developers here that are being, they have an edge into Kubernetes and they have an edge into OpenShift. So this means that during the last year, those developers, all of these here, have contributed to both worlds. So in this case, Kubernetes and OpenShift. And the same thing happened here. So we have these three developers that during the last year, we see that have contributed to OpenShift and Yeager, in this case, and then we can see some other that have contributed to Kubernetes and Yeager as well. So all of these people. So these are the basics of the network diagram. It's true, so in addition to this or on top of this, we can specify certain filters, as the ones we already provided. We can go for the time picker here, so we can go for the last month, if we are interested. And then we can produce other kind of data sets or widgets, so for instance, you were specifically commenting the newcomers. So we can have at least the very last people that joined the community. So then we can say, from a community perspective, hello, welcome, you can help them or facilitate the process for the onboarding process and so on. So maybe you can detail a bit more, Diane, your specific work there. Yeah, so I think one of the things that is hard to tease out is retention of newcomers, engagement with newcomers, when new organizations. And so from my perspective, I'm very organizational based. So when a new organization starts contributing to OpenShift or starts using OpenShift, I wanna know about it. Or, and so this data is also includes, they've logged an issue, they've made a comment and stack overflow, all kinds of different places. So this really helps me as a community development person understand new entrance and when they are and then the onboarding process begins, the outreach, making sure that they have what they need. And that doesn't always mean stalking them or throwing information at them. It's just being aware is huge. Cause then when they come and they show up at maybe your event or they ask a question, you know, you already aware that they're in the community and that gives you a step ahead. So I think that there's like, and we'll talk about this a little bit later is, there's a number of personas that we tease out from this data that really help. And maybe if you dive into maybe the Clayton-Coleman historical analysis, that'll help a little bit too. So once you explain what you're showing here, and if people don't know Clayton, then they don't know Kubernetes. I think that's a bumper sticker sign language. He is one of the lead contributors and architects for OpenShift and on Kubernetes itself. So his watching someone like him evolve over time is really a good example of, you know, how someone on boards and gets deeply, deeply involved into a project. Yeah, so this task for contains a couple of widgets, as you can see, and then this is so far for the 2012 year. So this is eight years ago. On the left, we have the number of commits for each of the projects. And then the procedure for each of the projects we have for each of the bars, you'll see more bars there in the next years for Clayton. And then for each of them, this is split into different repositories this developer has been participating at. Then at the same time, we have on the right, like a diagram where we can see Clayton in the middle. And then we will see all of the repositories Clayton have been participating at in each of the years. So then we will see like snaps of the Clayton for 2012, 2013, 2014, 2015. So this is at the very beginning of Unsafe and then we have Origin, not WordPress example, and then Django example. So those are the two main projects where Clayton was in this case contributing to. We move on, then this is 2013, then we can see how there are some more projects, Python interface, the website for OpenSafe in GitHub, press client for Java and Java client and some others. Then we see how the network is kind of growing. 2014, then we can see OpenSafe, still most of the activity for Clayton, but then we go for certain projects. And so we, instead of having the projects in the CNC ecosystem split by Kubernetes or Jagger and so on, same thing before we have graduated and incubated. So then you will see how this keeps growing. But if we go to the specific repositories, then we can see that this is Kubernetes, this is the API, and then these are examples to use Kubernetes. And then we see how this is the whole activity of Clayton in 2014 in this case. We keep it once in, then most of the work is in OpenSafe origin, but then more and more commits are done in the CNC ecosystem. 2016, even more repositories. And then we have incubating projects. So this is probably some new projects in the CNC ecosystem plus all of the graduated ones. So most of them, as you can see are Kubernetes, examples, community, you can go cluster registry API and Kubernetes. Then we can go to 2017 and Clayton keeps growing. 2018, we have some activity in the Operator Framework. Clayton has started to participate there. And then 2019, so we have Operator Framework, some incubating projects graduated and OpenSafe. And then kind of nowadays show the last six months approximately. So this is most activity we have for Clayton. Okay. This is interesting because had we known nothing about Clayton or the oncoming of Kubernetes and we'd been watching Clayton back in 2012 evolve, theoretically we could have started to see the importance in the rise of Kubernetes to this. If anybody outside of Red Hat probably could have seen it. I think we saw it inside because Clayton was vociferously endorsing the work that was going on in Kubernetes. But I think you can see from this example, there's also ways to start seeing as people move to other technologies, whether they're Edge or IoT or they start using Open Data Hub or different networking solutions or load balancers or whatever it is, you can start seeing when they start contributing to other projects or posting questions about them, you can start to see where things break down or where things are picking up speed and where projects are maturing. And so it's a really useful set of tooling for people who are ecosystem watchers like myself. Do you wanna add any more to that, Daniel? I thought we can move to the organization persona specifically, what do you think? Yeah, absolutely. And then we can hit the slides after that. Yeah, so this example in the same way that we can look for specific people or newcomers, so in this case, newcomers persona, we were discussing that it's important for you, Diane, the newcomers in the sense of new organizations coming to the community. And then the relations with other communities or organizations. So in this case, the example we see right here in this chart is Uber activity in the whole GNCF plus operators. So the dots again are developers and then we can see that both certain specific repositories so we have open tracing. Then we can see some more open tracing, open tracing, Jagger in this case and then we have the developers working there. ETC, more open tracing, GRPC, Prometheus, okay? And then perhaps if we move to the next one, then we can see how this is related to Red Hat, right? So then maybe you can elaborate a bit more about the importance of connectors. So I think one, there's a couple of things that this is showcasing, and is one, I look at OpenShift from an organizational based set of glasses. So I like to look at whether it's Uber who is not an OpenShift customer, how they touch down in our ecosystem, how, and then people who are end users touch our different spheres of influence and how we're connected to them. But this is also really shows me if I need to find someone to talk about not just open tracing, but maybe Prometheus or Chaos Engineering or whatever it is, this starts to show me the people who are the influencers or the connectors between projects. So say I'm looking for someone who's done something with Grafana, open tracing and Kubernetes and OpenShift, these diagrams to speak to internally at Uber, right? Or at a conference like CNCF, it allows me to figure out and trace, not to be using a pun, trace the relationship back to someone who might either be that person to speak or know the person or help another person speak with a little bit more insight into the project. So it's really been a huge tool to help build peer-to-peer relationships, to help foster collaboration across projects and to see where organically cross-pollination between projects is happening. Yeah, so in this case, what we can see in this example are Uber and Grafana contributions to those projects that we mentioned. So CNCF with graduated and incubated operators and OpenShift, and then the legend of colors is this purple and red hat and then Uber is kind of this brown, orange color. Then we can see that there are some Uber developers and then there are relations because we can see that there are different developers. If you go back up a little bit, that really big dot there is Travis Nelson who is, I haven't know if you added Rook in here, he's the gentleman behind or one of the leads on Rook. It's interesting to see where people pop up in other diagrams as well. So there's a whole slew of work there. So which repository is that one connecting to that Travis is in in the center of? Oh, so these are all of the incubating projects. So these are all of the projects that we have under this table. I would bet that is, yeah, that he's there because of Rook, yeah. And that's again why you kind of need some domain knowledge as well. So it's not a perfect, it's not gonna make you AI intelligent about who's in your community, but it does give you a big jumpstart. Yeah, and that's a really good point, the domain expertise because each time, so it usually happens, right? Like I point to certain data sets and then you say, oh, that makes absolutely sense because of this and this reason. And then I said, okay. So it's like I can point to the specific oddities in the data set or specific like highlight certain areas and then you say, oh, that makes sense because of this. And then you can go there and dig into the data and so on. So that's really, really important to have this tandem between context knowledge and domain knowledge and expertise with the tool. So should I pop back into the slides? Just to mention to them. Yeah. Yeah. You serve your screen? I will share my screen now. And we'll pop back in the slides and talk a little bit about the personas here. So we talked a little bit about this and let's see if we can get this to go forward. There we go. So this is the Uber diagram. No, I keep using that word but the over thing and this is again, this is a screenshot from 2019. It is much bigger now and I need to take a new screenshot. But we dove in a little bit in to see how things worked with Jager and OpenShift and Kubernetes. And it was really helpful for me. Again, especially when I was first learning a little bit about Jager and OpenTracing to be able to be knowledgeable about who was in the community was key for me to be able to be helpful in helping them nurture the relationship with the CNCF to get to incubating and now graduated status. So, and I was not a participant in that community. So I had no foreknowledge other than that. The other thing that it lets you tease out is were other people who you know in the community like Grape Swift, I had no idea he had any connection to Jager. So it was really pretty cool to be able to do this. And so this kind of led us to that kind of first pass at really leveraging the data led us to start talking about OKD personas because OKD is really the project that I try and foster along with a few others like Quay and operators and others. But this is really meant for me by assigning personas to these folks have helped me sort of untangle the community relationships. And so we kind of at the moment I have about five personas that I look at and categorize people as the tangential personas people who are working in one community and working in but not working in others. So they're kind of tangential to your project. They may not be working on OpenShift but they're still important to OpenShift. So like Yuri from Uber or Connector Personas that are working in multiple ones. Those are really good. And then we mentioned earlier newcomer personas very important part of community development is flagging new entrants, fostering them, making you know understanding how long they stay how long it takes them to get deeply involved very important aspect of community development. Identifying project leads and personas. So Clayton of course was a known entity to anyone inside of Red Hat and pretty much anyone inside of Kubernetes. But starting to figure out how to identify other folks as we wanna create more diverse and healthy ecosystems. And someday Clayton might wanna retire. So who are we going to level up and put in maintainer and contributor roles who was doing that. And to make sure we have a diverse and healthy group of project leads. And then again for me, organizational personas that's when you aggregate everybody from whether it's Uber or Amadeus or any one of the end users that are using your project to really understand how they're using it and what other projects they're using. So as we saw we didn't actually have the data for OpenStack but if we could have gone back further because Clayton had done some work on a little tiny aspect of OpenStack with me ages ago. And so when we bring in the OpenStack one you can even tease it out how people migrate from OpenStack to Kubernetes or in other aspects. So it's really a very interesting way to see how people show up in communities and where things are going. And the small part of OpenStack was a project called Solom which was supposed to be OpenStack's platform as a service back in the day. If anyone remembers that shout out to Adrian Otto. And yeah, so we could really dive into that. The other one we talked about was organizational personas to be able to see where they're working in, what space, where they overlap when they contribute to your project to other projects. And really we use that for, I use that on a regular basis to understand what our end users are doing and to make sure that if there's a new feature or a new set of technologies out there like Edge or IoT or networking storage, you name it that they're looking at or starting to contribute to. And we wanna make sure we know that from a product perspective and from a community perspective. So sometimes this is a good one here, I'll just walk through it quickly here. This was all of the projects that CERN was contributing to and then the other person that we started to look at when we dive down into an individual person because we knew Greg Swift who is now at LogDNA but at the time was at Rackspace so he had some open-stack connections, had that tiny little, or not tiny, I'm sure it was a real contribution to the Yeager conversations but we could start to play out and see where they were playing in that. So it's really kind of interesting and plus I had all the data from Common so hence and therefore in 2018 he was also my contributor to the conversations in the community award that we gave him. So there's really lots of great ways to use this and then as I mentioned earlier, Yuri who was tangential but very important to this as well as other work that's being done at Uber on operators and the operator pattern using it for M3DB showcase that at Common's event they were showcasing that at the CNCF event a while back. So getting these advanced signals, even if they're weak signals, they're still really important signals to be aware of what people are doing and hence some influence around the conversations around operators and operator framework and the operator pattern kind of emerged and that was pretty important. Again, being able to really look deeply into one of your corporate customers personas where they show up, what they're working on, using OCP on Azure, OCP on OpenStack. Amadeus has been huge OpenShip Commons community members. They've been on stage at Red Hat Summit, they've been in CNCF talks but being able to really see where they're going and what new technologies they might be working on. We had them on talking about Kafka, not too long ago on stage because they were some of the leading lights using in an enterprise situation, Kafka and willing to talk about it. So that was a great opportunity to do that. There's also, I mentioned going down wormholes. This might have been one of the wormholes because but it turned out to be not quite as bad of a wormhole as I thought it was. We kind of laugh a little bit about this one is that the data is not always perfect and every once in a while when we do the sorting hat and this is where we go back to having some domain expertise teasing out why Kim Min showed up as a contributor to OpenShift turned out to be a misinterpretation of the data in terms of one of the issues or something that was logged to something. However, it did give me a very weak signal that at Alibaba and Alipay, they were looking at OpenShift and OKD and Origin which then merged into, they eventually had a deployment of OpenShift and OKD there. So it was, and I ran into them at one of the CNCF events or was a Linux foundation event and they came up to me afterwards and say, hey, yeah, we are, this is who I am. But correctly identifying people is pretty important too. And then as everybody's well aware of, we have another problem space too is now that IBM and Red Hat are conjoined twins and are all under one umbrella, learning who is in the IBM world that are also contributing to the different projects so that we can make the best and take advantage of where we have other representation and other network connections in projects. So that's another thing that we've been looking at closely with all of this data. So yeah, those are pretty important relationships, obviously. It really has helped us a lot from the commons model which is ecosystem based or open source community development that we're working with here at OpenShift and at Red Hat. And really what our goal is is not to, we're not trying to stalk people or do that. We're really trying to promote peer-to-peer interactions. So it allows us to understand where those interactions are happening across projects and nurture them to, as I always like to say, give away the podium because it's often not about the code contribution at all. It's more about sharing the information, the knowledge, making the connections so that someone's working on one feature in one project that impacts another one, getting them to connect or be able to facilitate your feature getting into their roadmap. Making those connections are really the things that community development is now all about rather than trying just to get everybody to contribute code to your metrics on this DAC analytics or whatever the dashboard is looks great. We all know that's a wonderful thing to be the number one contributor to a project or whatever and our powers that we love us to be there. However, the more important thing is that all of the communication and the network of peers is nurtured and healthy and again diverse and well engaged and know how to engage with each other. So that has really been the model that we've been going for with OpenShift Commons is giving away the podium, pulling in the people to speak at things like OpenShift Commons Briefings on topics that you might not have thought were relevant, but once you look at the model, you can see, oh, there's this project out there that's about to hit you all like a ton of bricks so you'd better know something about it. So we'll pull someone in there and give them the podium. So that's really kind of what we've been teasing out over the past couple of years and whenever anyone hears me talk about jellyfish, they probably shut down their ears now, but these are the kinds of tools that we really think help build healthy communities because it's not possible any longer with the complexity in these communities and these relationships to do it on gut or personal relationships. There are just way too many repos to watch. There are way too many people in those repos. There are way too many relationships and so much of our companies and our customers and our end users depend on these things being well-oiled machines that we can't really risk it on a gut instinct or Diane putting a mailing list into a spreadsheet doing analytics on it anymore. We haven't done that for a long time. This has really been the thing that's helped us to do this and so these are some of the conclusions that we have sort of reached as many more that we can tease out here, and I think it's pretty obvious that no company, whether it's an end user company, a technology company, a hosting provider is really working on just one thing. That's been pretty key and this is data-driven approach has been really helpful for upstream coordination and that is essential and these relationships really, really matter to everybody. Having domain knowledge is really been key and this is not really an attack on old school, community, individual management that's kind of nurturing still needs to happen for your project. You can't abandon that but it does behoove you to take a more ecosystem approach and to help you do that with some data-driven tools and then as Daniel always tells me, data matters. You've got to clean your data and curate your data and you need good tools like we've gotten from Bitersia and I know I had in the beginning a routine every Saturday morning, I'd sit down with a cup of coffee and run the report and see who the outliers were and where there was duplication, where the sorting hat didn't work and have to go back in and do that cleanup work. So I think that's been, for me, one of the habits that I would like to see more community people develop and incorporate is really to start understanding who's in your community. I can't say that more vociferously that is the key to all of this. If you don't understand the domain and you don't know who's in your community, it's really difficult to do any aspect of community development, be it marketing, content delivery, code, contribution, onboarding, any aspect of community. And then I always put that, anonymity is dead because it is and then maybe what's next? Now that we're part of IBM, maybe someday they'll give me access to IBM Watson and we can tie that in and do some real predictive analysis or take even better open data hub and apply it to this, just like we do for telemetry and other things. So that's kind of where we're at right now in terms of the deep dive here. And Daniel, is there now that I've talked forever? Other things you wanna add in there and I'll look and see if we have any questions or if anyone's asked anything. But not many more things to add here. So yes, to say that this is a really interesting and funny thing to deal with. So I'm pretty happy to have participated and keep evolving this IBM concept. So thanks for your time. Yeah, the one question that's come in, which I think is a good one too is what the correlation between code collaboration between personas and the company team membership. That's an interesting one. I've used the tooling so far to identify the team from say Amadeus or Uber, who is working on the open source sites. It doesn't give me insights into who's behind the firewall. I don't always know everybody at Amadeus or that, but it does give me a way to do that. We could easily with this tool watch the development like we did with Clayton's analysis. Instead of just doing an individual, watch the growth of open source participation in different repos for an entire organization. And that would show us the, I think a bit of the correlation between code collaboration between the personas and that what we haven't done is tagged the tagging or the grouping of people into those personas is still a hand wavy Diane thing. Like when I see someone that's, I recognize them now as a tangential or I recognize them as there. The tooling does not recognize people automatically yet as that. Or and that's where I think maybe the predictive stuff might help us too is to recognize what the path is to going from tangential to being connected to multiple things historically. And there's only so much time in the day, but these are things that are very much of interest to me to continue to do with this work. Is as we try and nurture healthy, engage, and diverse communities, these kinds of toolings and the metadata that we add to say the sorting hat and the identity management will hopefully help us tease out different issues around marginalized communities and make sure that we give the podium and the support people across lots of communities, whether they're technology communities or communities of interest in other aspects of their lives. I think that's what we had for time for today. And I know, Daniel, it's late where you are, you're off in Spain and I'm up in Canada. And we're probably the only people that don't care about the holiday coming up this weekend. But if you have questions, please do reach out to us. We're very happy to make that happen for you. And let me just throw up the last page here or my favorite page here, the Canadian one because yesterday was July 1st, Canada Day. And this is a wonderful Wayne Gretzky quote here is the goal here is to skate to where the puck is going, not to where it's been. But where it's been always informs us, hopefully, of how to grow new people in your community and keep them engaged. So with that, and we'll see if we can get to the next slide some days, we just say thank you. And if you're interested in this topic, please reach out to us and we'll be happy to continue the conversation.