 Great, 11.05 Pacific time, Tuesday June 25th, and welcome to the CNCFCI working group call. On the agenda today, we'll take a look at some in progress and upcoming events on our radar. We will take a look at some updates to the CNCFCI status dashboard, as well as some work in progress, some tickets in progress, and possibly discuss moving this call to one hour later from 11 AM Pacific time to 12 noon Pacific time. If you are on the CNCFCI Slack channel, please join the CNCFCI channel to continue the conversation after this call. You're also welcome to join the public mailing list, the CNCFCI public mailing list, also listed in the meeting notes. Are there any other items folks would like to add to the agenda before we get started? Is my audio okay? Great. Well, no worries. If anything comes to mind, please feel free to add to the meeting notes in a agenda or in the Zoom chat or just verbalize if you have anything you'd like to add, including upcoming events. So as we speak, KubeCon, CloudNativeCon, open source summit, China 2019 is happening in Shanghai. Thanks for all attending. If you are in Shanghai, it is 2 in the morning there, so I appreciate your time and attention for the CI working group call. Earlier today, there was the intro and deep dive, birds of a feather for the Telcom user group and the CloudNative network functions CNF test bed. And I believe, Taylor, you're on the call. How did your session go today? It was great. We had a lot of participation from both Telcom providers as well as vendors and some interest from, I would say, more like Kubernetes developers and stuff. There are a few people there that are interested in, I guess, the CI side of things in general, besides all of the networking aspects, so it was a good session, good conference so far. Sounds good. Well, congratulations on a good session and hope you enjoy the rest of your time there with the duration of the conference. Also happening now is Container Days 2019, does anyone on the call currently attending that conference? All right. Well, I hope it's going well. Coming soon in the end of September is the Open Networking Summit in Europe. It is in Antwerp, Belgium, just a train ride from Brussels. The CFP has closed, so those who have submitted CFPs to ONS will receive your notifications on Friday, July 5th with the schedule announcement on July 10th. The end of October is Open Source Summit Europe in France, and it looks like the CFP window is still open. You have until Monday, July 1st to get your CFP in for OSS Europe. In November will be the next KubeCon, CloudNativeCon in North America and San Diego. The CFP window is open until July 12th, and there are several co-located events. One with the CFP window open that I'm aware of is EnvoyCon, and the CFP window closes on July 12th for that co-located event as well. Are there any other conferences or co-located events at KubeCon that anyone would like to mention? Yes, good. I'm sure more will be released and publicized as the time gets closer. So I'll share a little bit about what the team has been working on on the CNCF CI status dashboard since our last call in April. The meeting notes and the slide deck are shared to anyone with the link. If you're interested in following along, the link is in the CI working group document. This call will be recorded and published to the CNCF YouTube channel. So cncf.ci v240 and v241 releases. We released v240 on May 14th, where we added arm support to three of the CNCF posted graduated status projects, Core DNS for Prometheus and Fluent D. And we also updated the test environment dropdown to include a label check mark like Gray Heber, and we adjusted the height of the footer. I'll go to cncf.ci production now and show you those changes. So the test environment here at the top of the page is your Kubernetes test environment. You can select if you would like to look at the stable Kubernetes or the head commit of Kubernetes on x86 or ARM. And so when I select Kubernetes stable on ARM, the provisioning was a success to bare metal packet and the build and deploy stages for Core DNS, Envoy, Fluent D and Prometheus are active. We've not yet added arm support to linker to your own app at this time. Likewise, if I go to the head commit, I can see the status of the builds and deploys for those projects that support ARM received a fun, fun bit of press on adding ARM to CNCFCI, cloud native computing foundation, ARMS itself with graduated project support. That was fun. And at 2.41, we added ARM support to Envoy and we also updated the hardware provisioning code to use the reserved ARM machines. Thank you so much. Packet for providing reserved machines for both x86 and ARM. This really helps with the, I guess, percentage of times we see success. Sometimes it would be failed because in that moment in time at 3 a.m. Eastern time when we were provisioning these machines, maybe another customer was also provisioning machines and so there wasn't one available for the CNCFCI. Now the machines are reserved just for CNCFCI and we see the successful provisioning much more often now. So thank you very much. Hey Lucie, it's Philippe here. I see that for the Envoy build on ARM there was a failure. Is the basal dependency or issue fixed now or is that still a problem? Hi Philippe, let's take a look. It looks like it is still mentioning basal. Denver or Taylor, would you like to speak to this issue? Yeah, this is only happening on the stable release for Envoy at the moment for some reason with ARM but the same build works on it. So I haven't had a chance to dive into that but I'm suspecting maybe that the stable release is mixing some patches. Okay, no worries. I was just checking. This is Ed Vilmetti from Packet. I am aware that the basal build system for ARM has gone through some iterations and at various points it has not been completely stable. So I might also look to that as a possible source of opportunities to engage in the community there. Okay, thanks. That's great. Thank you so much. Since our last call in April we also had some presentations at KubeCon Europe. There's an intro to CNCFCI and the information is here in the slides. You can take a look at, if you go to the sketch, you can find the slides as well as the YouTube recording for both the intro and the deep dive and the deep dive topic was adding ARM to CNCFCI and really digging into the code and talking about some of the challenges and benefits and how lessons learned and our experience adding ARM to those four graduated projects. Currently, we are working on our 3.0 release that will include, we're adding sub headers to display the CNCFCI posted projects by CNCFCI maturity level. This is ticket 120 in the cross cloud repo. And we will be, as we're adding more projects to the dashboard, we'll be showing all of the graduated projects together right under the test environment. And then we'll show the incubating CNCFCI projects followed by currently the Linux foundation project of ONAP. Once we get the graduated and incubating projects added, then there's definitely room for adding, starting to add sandbox projects. First, though, we'll focus on the graduated level and incubating level. So that's in progress now. I'll ask the team, does anyone on the team want to show the current status of this ticket adding sub headers on the call? Hey guys, this is Josh. I believe I could show a current status. Let me give me one second. Still a work in progress. We're nearly there. But let me share my screen. Okay. Okay, I believe this is it. And so we've got a, essentially, we have the three CNCF relation stages right now to start with graduated projects, incubating projects and Linux foundation projects. They're all pulling through organized through a YAML file with the labeled project CNCF relation sub header sections as well as ordering. And so it's coming along. It's nearly ready to be released. That looks great. And yes, our mock is a little bit out of date. So our mock had, I believe, linkerty in, or no, it had fluency in the incubating, but it had since graduated. So thank you so much for updating that. Looking forward to seeing it on production. Coming soon. I'll jump back on to the screen share. The next item that's in progress is our epic of refactoring the CI system. The goals to refactoring the CI system are twofold. We want to deprecate the cross cloud Kubernetes provisioner that was custom made. And we want to use a community tool, QBADM. And we also want to make it easier for CNCF hosted projects to contribute to the CNCFCI dashboard by adding and maintaining their projects directly. And the current setup makes it, current setup is not optimized for external contributions. So that's one of our main goals for doing the CI refactoring. And so far, we've broken it down to several parts for the main epic of supporting QBADM for bootstrapping the Kubernetes clusters onto packet. And so far, we have made good progress on updating how hardware is provisioning. So Denver, if you'd like to talk a little bit about it, I've dropped in our visual representation of the CI infrastructure refactor as well. And if you'd like to share your screen for anything or want me to go to any link, please let me know. Here should be all right. I won't be able to share my screen, but I can just talk to it. So we've finished the first component of this, which was separating out the hardware provisioning versus the Kubernetes bootstrapping because in the past with cross cloud, they were one of the same. But as the CI system, we want to have both these parts be composable. So the provisioning fails, we know that that failed and have better Terraform logging. So we can tell why it failed and then be able to run retries and roll back. So it's not just a fail on the jobs over. And that involved adding some additional scripts to be able to check inventory at packet when we're not using reserve servers. So do we have we're asking for five nodes are there is their capacity for that? So I've added things like that, which will allow us to go. OK, check the facility. Do we have everything that we need? And if it's not there, switch to another one. And so that's how we provisioning stage is now complete. And the next portion that we're working on is getting CubeSpray ready to support the CNCFCI project. So the large amount of most of the work we're doing on that at the moment is adding arms to support CubeSpray already supports arm at this point. But we needed to support our builds so we know what version of Kubernetes we're versioning at running. And if we're doing a head play of Kubernetes, we need to know what commit so we can show them that. So that's in progress. And then the next ticket we have after that also is to support container because containers are a project under CNCF. But at the moment, CubeSpray is only supporting DockerD directly. And that covers everything in progress on that side of things. Lucena. Can I ask a question? Maybe since we have some packet folks on regarding the provisioning stage, I dropped a link to a question I had on the packet Slack. I don't know, Ed, if you saw that thread, this is on the integration thread. So anybody that has, if you've already joined the packet Slack, can go to that. It's an integration channel. But I can speak to you real fast. So when Denver was working on the capacity checks, there's a API, which actually I can drop that. So there's an API call in the packet API that lets you do capacity checks for non-reserves, so the on-demand instances. And it doesn't say that, but it's on-demand, but that's what it actually is. There isn't, doesn't seem to be the capability to check reserved. So what we ended up having to do is go through multiple different API calls to do the equivalent of the capacity check. And you have to do it in the lib for all the types of machines that you're doing, as well as all the facilities to figure out capacity, which means when provisioning is happening specifically for the environments that use reserved instances on CNCFCI, it's going to have to do a lot of calls. And that's without increasing the number of how often instances are provisioned, like if it was more often than the current once a day, which could happen, as well as as more projects are added. So if all of the incubating projects are added, then it's going to be related. Or if we add more, I should say, more options and spending at more clusters and other stuff like that, when you're provisioning. So I don't know, Ed, if you saw that and if you could speak to it. Yeah. Yeah, I did see it. And I guess I had, I had some questions to try to clarify what you're looking for. There's. Just to let you know, Ed, Mo over a packet, he said he created something internal on Jira to track, but I, that's all I know. That was the last time. Yeah, just just to make sure that I have your intent correctly. So you're you're trying to figure out with a simple call, how many or basically to characterize which of the multiple reserve nodes you currently can provision, or you're trying to figure out. You're trying to figure out if there's any reserved facilities and plans. In other words, machines that you already have set aside, you're trying to figure out whether they're in use or not or whether they're free. Essentially, can can using reserved instances can packet fulfill my request of three nodes of into extra lodges and. OK, so specifically, specifically validate. Yeah, OK, I see what's. Yeah, the I was trying to distinguish between the case of. Facility out of inventory versus reserved inventory already in use. And I think, yeah, so we're looking at a in a facility base by base. So I first try S.G.C. Do we have the reserve capacity to to provision this these amount of resources and the trick with the reserved ones is the deep revision times quite a lot. So we may deep revision, but then try run again, but it's still 15 minutes until they're available again. So. Right. And there's not a lot of visibility into the. Deep provisioned, but still reserved nodes. Those are hard to find. Right. OK, I understand now it's it's the deep provision cycle you're trying to get. That you don't currently have. Very much visibility into and therefore. You can't guarantee that a provision against reserve nodes will work because the nodes might be in a deep provision state. Right. OK, that makes more sense. I'll talk to Mo about this. Find that you're at any explanation to it based on this context and and try to bring it forward. Yeah, ideally it would work at the higher level, just like the the other capacity check and. If if you don't specify specific instance IDs whenever you're using the reserve, then the machine creation API acts pretty similar. So it would do next available from the pool of reserved similar to next available from the pool of on demand. Ideally, the capacity check from a high level would do the same. Do you have enough capacity in your reserve pool? Yes, go forward. And if not, that's, you know, we can get more specific. But ideally, you can at least do the minimum of that. I don't know if there's if you can provide information on those or deprovisioning, that would be great. But that's that's probably would be the next level of details. Yeah, OK. Yeah, that provides some context. I I hadn't been thinking about the deprovisioning step, but obviously that can take a machine out of your usable pool for a meaningful amount of time. Yeah, and if you're not digging through the right API calls, then it's kind of invisible. What why it's happening is just missing. They're just missing. Yeah, you'd have to get through a lot of other. When you look at the hardware reservation call, there's a lot of references to other APIs that you have to call through to start gathering all the information before you have an idea of what's happening. So right. And there are a couple of there are a couple of states that might be invisible to distinguish between you know, reprovisioning deprovisioning in process versus what occasionally happens is deprovisioning fails. I don't even know how you would tell that a reserve node had had failed to deprovision correctly. So yeah, that's what looking at and clarifying and trying to make an API endpoint that's straightforward to use. At a minimum, if there's going to be deprovisioning or we could say what is the state of the system? And if we know it's in a deprovisioning state and there's some type of timestamp related, then ideally we can at least derive the, oh, it shouldn't be in a deprovisioning state for two hours. Right. And that would at least tell us that even if there's something broken within the packet side that it failed and it can't, you know, it's not updating but at least as a human, we could derive something from that. Right. Okay, I'll sync up with Mo and make sure that that ticket is advanced. Okay, this is Watson and I had a comment too if we're talking about checking capacity. I guess Denver mentioned this, what I would call it an atomic way. So saying are three servers available because I need three, otherwise it's not useful to me. I guess I don't remember the API says, can you reserve three at a time? But that also you're still gonna run into problems if you can't also reserve in an atomic way, sets of three or whatever. So that's a good topic. That was Watson, I've forgotten all about that. And that's relevant to the on-demand as well. Atomically creating a set of servers that we need. And this is something when we were going through talking about the refactor, what would happen if we're provisioning the cluster and you get through and you've gotten to a point where one of the nodes is not available or not at that time. Because there is a time period from a capacity check to creating each one of the machine nodes. And we could hit it, especially if the cluster size is larger. Right, essentially it's a race condition. You check the capacity, but by the time you actually go to provision, the state has changed. Yeah, so it's almost like it's not a capacity check. It would be, this is an actual request to allocate those machines as an entire set. And then when you say allocate them all, whatever you do for implementation on the backside, you mark those where they're going to be provisioned, even if they're available to whoever asks. Right. Okay, it makes sense. I'm sure that given your description of it, I'm sure that you're not the only person who has seen this behavior. So for us right now, we treat it as an atomic action on the CNCF provisioning side. And we just say the cluster is unusable and we'd need to start the entire machine provisioning. And that will be the approach we'll go down until there's something in the API that can help that. Right, cause two machines is useless if you need three. Yep, and of course we're doing, these are rather small cluster sets. Someone else might say we need 50 for whatever they're doing. Okay, good to know. Thanks for the feedback. That's great, thank you. And we're also working on the Epic to add support for external contributions. And currently we've completed one step, which is changing where the project details, the display name logo and caption are retrieved for the dashboard. And in progress we are working on the release details. And so that corresponds to the visual columns you can see on CNCF CI. So anything under the project column has a new repo per project where those details can be updated. And currently we're working on things within the release column. And I'll take a look at the link that was dropped in the meeting notes. And Watson, would you like to talk a bit about changing the release details? Yeah, so the release details what we're trying to do is have it so that the project owners, maintainers can have more control, fine grain control when they add the project. So release details are there in the email file on the bottom. This file is something that the project maintainers would eventually control. So stable ref is, and I give a little bit of history of the stable ref and what a ref is. A ref is really a semantic version or we hope a semantic version that's really anything that a product maintainer decides as a tag for their release name. We talked previously about trying to do this programmatically on our end or something and it just doesn't work. Some people don't use semantic versioning, some people do different things. So it's best for us, we had to maintain that in a string. And so we'd rather have it to where the project maintainers do that. And then of course the head ref is a branch name. So you've got a tag or a branch. Is both of those, either those can be in GitLab, our lands is a ref. So we allow for the project contributors now to, they can do a pull request against their respective project dash configuration file that's under the cross cloud project. I'm not cncfci.yaml and in the future we're gonna have it to where that cncfciyaml is actually in the project's repo so that they can maintain it kind of like a traffic CI or circle CI style. And so that's the direction we're going. And that's on dev now. So we're looking to have that up. And on prod sometimes we... Thanks Watson. And yes, we'll also update the contributing guide. We've got under cross cloud CI repo the contributing markdown file which will be updated incrementally as each one of these steps are updated. So currently you can update the release details and follow the steps here. And once we have the stable ref and the head ref promoted through we'll update this contributing guide to add those steps as well. After release details, we will be working on the integrations with the external systems to retrieve the build details from the cncf project CI system. So we currently do an external integration with one of the projects that are on the dashboard and that is Onap. And Onap uses Jenkins. So we're connected to that external CI system. And we'll take a look at the graduated projects first and see if they're using Travis CI or Circle CI or drone.io and decide and pick one integration to start with. After that we'll work on the app deploy phase. So we'll use the container artifacts that were published by the cncf project to inform the deploy step and update the documentation. And then we'll be able to start collaborating with the cncf hosted projects for more interaction and having them add and maintain their projects. And after that, we'll work on the end-to-end tests with the projects. There is one project that is currently on the dashboard that is in need of an update. So we're continuing the conversation with the Linkerd team to update the Linkerd project from Linkerd 1 to Linkerd 2. Currently the dashboard is synced up with their Linkerd 1.x project and we'll be updating that to point to the Linkerd 2 repo. So that's in planning now. Could I speak to the external integration side? Yes please, thanks Taylor. Philippe, I don't know if you're familiar with, how familiar with the work that Matt Spencer is doing on the drone side with Fluent Day but maybe you'd be able to talk to that as well. I had a pretty extensive conversation in KubeCon Shanghai yesterday with Matt Spencer from ARM. And he was talking about builds for Fluent Day on the Travis CI and issues that we've been seeing specifically with the ARM builds. And so we got into some different conversations about the different CI systems that can run. And I know there's a lot of other CI systems that are being used in general but also some of the problems that people have had on some of the existing ones and they're looking at moving to maybe GitLab or if they have really extensive needs or if they're needing something more portable. So drone IO was one of the ones that looks promising for people who want multiple architectures and something real portable. And we were discussing how number one we could help the projects directly by saying here's a easier way to run or do builds on different architectures maybe for every pull request was something that Matt was saying ARM is focused on. And then we're interested in being able to support those external systems. So ideally we can have some collaboration with ARM where if there's a drone potentially if there's like a drone CI running for the ARM builds for a project like FluentD then if we can have the status for those builds available specifically right now that would mean anything on head or the master branch as well as releases. And then the next piece would be the thing that was seen up mentioned which is artifacts. So if there's a new release of FluentD and it builds and works as expected on ARM then ideally we could figure out where the artifacts which by that I mean ideally that would be container or containers that are published on something like Docker Hub or wherever remote. I think it's promising for the effort that we've been working on because we do want external folks that are actually creating some of those CI's. Right now we've been doing most of the work like the own out integration to Jenkins. We did get some input on the status but if there's someone actually building it for a project that's not great but Philippe, did you have any input around this particular effort with ARM? No, I think you've captured it all mostly. I think Matt has been working more actively at testing that and drawing seems quite promising. We are engaging more actively with some of the other CI system companies behind it but drawing that currently seem to be quite well set up to actually get this running. So that's a reason why Matt actually discussed the way he did with you on that. Okay. And I guess a related item was there was, I know there was some maybe a need for help on ARM builds on GitLab. So ARM builds of projects using GitLab and we've approached it in multiple ways including using VMs for the ARM to build the projects for ARM as well as using the physical packet machines and deploying runners and actually building the various projects, their ARM side on packet machines. And I guess I would like to, if there's problems that people are having with GitLab for ARM builds, then Priyanka, maybe that's something that you can get engaged with and try to help drive and make sure that that's well supported. I'm not quite sure what the actual issues, Philippe, that Matt just said that they were having some problems and Drone IO seemed to be working pretty easily for the use cases that Fluent Day was doing. So just to go back, Fluent Day is on Travis, they're not on GitLab, but they were looking at GitLab and having some problems. Can I speak to that real quick? Yeah, please. They have their pipelines moved to GitLab and so the right, especially their Linux ones, which can use the GitLab shared runners on .com. So to the best of my knowledge, they do have some workloads on GitLab in addition to Travis and sounds like Drone as well. Well, Dean, so do you know anything about the ARM specific stuff on GitLab or is there a ticket or something that folks can? Yeah, I can. Let me check that. Let me search for the right ticket for you, the issue for you guys and post-based it here. And it's possible that we can help since we did, we do have ARM builds working for, as we've shown all the projects and it may be that the way that we're going about it is not what other folks that we're using. Got it. So let me kickstart an email between Taylor yourself, me, Eduardo and someone from the GitLab CI product side. And then people. And Phillip, who would be good to get on that from your side? I think Matt is the best person to get on board for from outside both that. Matt and Eduardo, Eduardo is already connected on the free on this side, but Matt is more specifically tracking that. So is that Matt.spenser at ARM or how is that? That's right, Matt.spenser at ARM would come in. Okay. This is Ed. I am aware of a set of patches that were put together to support GitLab Runner on ARM and that got pulled into a broader discussion of GitLab Runner on multiple architectures. And I will, I know that there are some open issues that might be worth reviewing for that, which I'll identify. And I'd love to help as much as I can just getting people to respond from ARM and all the time. Thanks, Bianca, thanks Ed. As I understood the discussion last, some things were gated behind some windows runner development. Essentially there was a code path that required one task to be complete before the next task started specific to windows. So I think if I had to characterize things, no one has said no to this request yet, but it's been a challenge to get to yes. Okay, got it, got it. Yeah, if you have the issue, if you can put it, that would be great, otherwise I can lower search too. Okay, I'll take something off. Thank you. So we do have runners on ARM, so I definitely want to get Denver and our side involved. So if you can just get us all on that list and we can at least contribute the knowledge about how we're doing it, which may not be the best way, but at least it can show here is a working example. Thanks everyone. All right, great. So we welcome feedback. We've got the Crossfield CI dashboard, Github tracker. You're welcome to join Slack, dncfslack.ci, email, join the mailing list. And this call is the fourth Tuesday of the month that currently 11 a.m. Pacific time. The next item on the agenda was maybe having a lazy consensus of plus ones, thumbs ups. If folks on this call would be available to move this, if we move this call to one hour later, starting at 12 noon Pacific time. I'm good with that. Looks like Taylor is as well. Oh, Christian and Denver as well. Philippe as well. Ed as well. Wonderful, I think that's a quorum. Can I request to resurface one of the things that was discussed before I could join this call? Kupkon San Diego, if possible. Yes, of course. The CFPs are open until July 12th for Kupkon North America. Yes, and I was just wondering if anyone here is interested in collaborating on some proposals. I really wanna submit some talks around CI and CI CD and just not like a GitLab CI story, but more like, I think this group here has a lot of the CI brain trust and we could put together something that is educational for a lot of folks. Maybe even just the CNCFCI working groups were experienced working with many projects and what we can distill for other folks. I don't know, there are just so many ideas, but if anyone's interested, I can kick off over Xtreme to come up with a nice proposal for us to go submit. And no worries if not. I think it sounds like a good idea. Why don't maybe ping off offline. Okay, yeah, that sounds great, I will definitely do that. And just FYI, I'm keynoting at the Open Source Summit in Lyon, France and would be delighted to talk about the work here over there as well. Great. Yeah, I think I'd be interested in doing it. Awesome, cool. Let me open, where's the talk? I've been following it on the Zoom, so I don't have it open. Whoever's writing notes, would you mind writing Denver and Taylor's names? Yeah, thank you, thank you so much. Just write that, yeah, and I'll reach out to both of you. Cool, thank you so much. That sounds great. All right, so we will see you again on the fourth Tuesday of July, continue conversation on Slack and the mailing list. And I will ask the CNCFCI team to update the community calendar for this call to start at 12 noon Pacific time for next time. These meeting notes are available anytime. So if you have any demo or news or updates or questions that you would like to address in next month's call, please feel free to add them at any time. I'll do my best to send reminders to the mailing list, Slack and our Twitter page. So we look forward to seeing you next time. Thanks everyone for your time. It was nice chatting with you. Thank you. Bye. Thank you everyone. Enjoy the rest of your day. Thank you.