 Hello and welcome to the linking stuff of the 15th of February and we'll start with the announcement, there was a core security release last week and there is also a plug-in security release today in progress so finishing the publishing of advisory and we'll delay the weekly release for tomorrow since there are system V change which need an announcement before releasing it. Tim, I assume you're okay if I do a blog post about that? System V stuff? Yeah. I just expect lots of shouting and I think it's healthy for us if we have it. Before the shouting starts we have a blog post that says look we said it happened here it's intentional this is not active hate this is us thinking about it and realizing we need to get to a modern way of doing process management in our services. I was going to suggest a tweet at least it's probably quicker. Oh yes and I will certainly tweet it as well it just for me the blog post is after the fact thing where we can highlight look here are some of the instructions we offered here are some of the things that we've implemented to try to make this as smooth as we can. Yeah I'm sure there'll be issues for some people but generally it should just work. Okay so we had an update on ci.junkins.io. There weren't any new agent allocated so we found the root cause it was missing IIM permission to allow auto-scaling uh we saw the gopnik in in the auto-scaler it was crash loop packing so we we tried to upgrade the telephone module and the minor version wasn't minor at all they it was a major version change it was a major yeah they remove a lot of stuff around identification around users around worker groups they rename a lot of variable they change the way of declaring for loops a lot of change so it didn't fix our problem but we needed anyway so and as they are right here we have some benefits like faster auto-scaling less resource because we don't need an public IP now and during this upgrade we found the missing permission by comparing what we had and what we needed we have to write a post-mortem and do a full request on the on the module to on the repository to add the new permission and we have also stuck it up in the site to put in place and this is the next point I forgot to write down we also have to contact AWS support or see if we can pull request their documentation to add the missing or contact someone at AWS saying hey folks since last Friday if you don't have that permission it's not working anymore because the AWS documentation shows less permission than the Terraform module which shows less than what is required since Friday so they made me some break they made a breaking change on a scale from zero to as you're changing their network implementation we are closest to Azure network cloud thanks survey and everyone for sorry for being late do you want me to take the lead or are there any other question before the docker hub okay so let's go ahead one of the consequences of changing the topology of the working node on that cluster is that we went from public IP to private IPs the direct impact is that we started to see pods in error status where they were enabled to pull the image from the docker hub because one we use the docker hub for the DMMP whatever agents and two the docker hub if you are if you aren't authenticated then it uses the public IP as a way to aggregate the request per day over six hour windows I don't remember exactly since we have private IPs for each working node this means they only have one egress IP seen from the docker hub so all the pool come from are connected to that public IP direct consequence we are rate limited we apply the short term fix because we had to under the CIG and Kinsayo queue so we did the docker registries secret like we did in the past with a new account docker hub account because we don't want to reuse the existing one that's absolutely out of question that's a public instance so the probability of that credential being stolen accidentally is close to 100% which means we don't want someone pushing images that we would use that's a free and empty account any images pushed on that account can be removed without even thinking can we use that account on the on the VM images as well that could be interesting yeah that could be um yeah that could be a good idea because currently we get like two or three builds an hour on the um on the docker builds otherwise we get rate limiting yes right yes so yep sorry go ahead I'm okay just to echo what Tim was describing I'm accustomed to having to do a rebuild or a schedule a rebuild two hours from now so that it will have quieted on the rate limit and give us another chance we normally get like five different about PRs all coming in with a minute and they're not all gonna pass yeah that that's word because the the VM should be recycled after one build on CIG and Kinsayo yeah it it may be that that what the the rate limit we're seeing is something different than the rate limit that you're seeing in terms of IP I don't know it just I I know we've seen surprise failures it might be better now I don't know but yeah if you have links that's interesting because that's nothing too recent I haven't done any work here in a while just a reminder last time we had the discussion and that we try to embed the configuration on the virtual machines we we were again rate limited because rate limit went from public IP to the account used and configured for the docker engines which is rate limited daily compared to the six our windows of the public IPs which mean if we have too much pills with the same account then we're it's that limit however since we are an open source project we could ask for increasing that limit for that account but I mean it also mean that we might want to step away from the docker hub yeah just move to AWS or get hub exactly there are multiple ways of doing that so that was only a short term fix so I've put the references of what we did for that part we'll keep an eye on it and see what happens but yeah we could fairly easily mirror all the Jenkins images to get hub container registry and then you wouldn't have to worry about it as well exactly I mean we have today two kind of containers the container we built and the container external container we use so if we start by the one we built we could push them directly to AWS or docker hub and whatever registry that should be a first layer second thing proposal from airbay we can totally add the docker image proxy on the cluster and as well on the on the AWS where we spawn the VM agent on same on Azure we can add on a docker proxy on each not sure oh it worked technically but that could be an idea and finally switching back to public IP I mean we have to pay but that's that's something that was working so that's a long term solution another consequence is that not only the C agent inside your workload but also we had data dog that was very limited because it appeared that we don't use the official agent data dog image we use a custom image built from the official with some stuff copied within uh I did not know I did not know that part and since they are built by ourselves so pushed on Jenkins CIF on docker hub they were recommended so here we saw a nice learning opportunity to help Stefan to get started the hard way on Kubernetes because it's not an easy to pick so he's going through the create a hand chart that would allow to specify a list of namespace so on each cluster if you specify that end chart it will create the docker registry secrets on all the namespaces that will allow to specify one time on subs the account on one location and the end chart will install on each namespace so each namespace could use the docker registry to authenticate their docker pool that should be at least a tool that we can reuse and once used for both Jenkins agents and data dog we can stop using it move to a proxy but we can in the future reuse that if we need it on short term that will help and long term is finding on data dog why the hell are we using this custom image while data dog provide everything on the hand charts I would prefer uh using relying on data dog especially since it says things for us to build and I bet that we could be some files because we weren't able to to mount them on the charts from an old time but now the hand charts of data dog does this for us so that should be easily fixed are there any other question people who want to work on that ideas feedbacks one two three now okay so that should be fixed and update now digital ocean my cues again yes um it's still a work in progress I have my my cluster speaking with my local Jenkins instance and I have to reproduce it uh um on ci.junkins.io and in our profile code nice so that will be the surprise we are going to discover what is the behavior of Jenkins when we provide two Kubernetes clusters how is the scheduling acting with these cases let's put three Kubernetes clusters and we'll make it one to in three oh no I think share as well and and Oracle get four go for four eventually yes right the yes multi cloud multi cloud everything the thing is at the moment on time we might be interested on having only a single control plane but multiple worker nodes I know that at least I think it's Google and scale where they are both providing that ability so you have only one Kubernetes controller connected to Jenkins and then you specify auto scaling workables on different locations that could be interesting for us at least feel like it's more but I mean I don't know if digital ocean gave us enough credits I mean I don't mind having one only one on their cloud system um something to be careful of I I thought there were dots but we have to remind us we have to put a maximum amount of pods that we can allocate per Kubernetes cluster at CI Jenkins at your level just to be sure that Jenkins doesn't start to schedule hundreds of pods to the poor digital ocean cluster that only has two worker nodes today because it will or we will end with a bunch of pod in pending states and since it's hard to anticipate is Jenkins going to try to schedule these pods on the other Kubernetes cluster that would be better to say to Jenkins if you try to schedule more than exports then it's full don't try more yeah definitely it's like four pods or whatever depending on what size nodes you're doing exactly because we have a static capacity for that one great job everybody I'm happy that you are able to do this that's really a good achievement um are there other questions about digital ascent point things you want to clarify those are all working now sorry sorry the digital ocean working now as maybe missed that so it's working with a local instance yes uh is I have a cluster on digital ascent uh work uh running uh little one and I uh put a save second on it so I could retrieve its token and link it to and terraformed it that's the secret oh okay so it was all right created with terraform on digital ocean great and I did the link to the repository next step update CLI for terraform so since last week uh Stefan I did more tracking elements using update CLI to track the the moving parts of our infrastructure most of them are related to terraform directly or indirectly for instance now both infrascii and release ci have virtual machine capabilities they aren't tight only to container like there were before and these virtual machines have the template up to date at almost the same rate as ci jenkin sayo which mean release and in fra are now able to have exactly the same environment as ci jenkin sayo if you read between the lines that mean the death of trusted ci on a visible future in favor of release ci not mentioning the ability to say hey if we split ci and cd for our contributor they should have the same environment for release as they see on the public ci makes obvious but that's hard to ensure when you have different genkins instances so great jobs uh Stefan that also allows in short term now the ability to run our packer system to build Docker images which is already the case we only have to deploy these images to a remote registry and use them but the loop is almost closed so we would so we could have exactly the same content between docker images and vm images for agents there are other let's say more minor updates like update getting the security group's names on one repo and updating them on other repo these are operational day-to-day things but congrats Stefan you are now a master of updates i don't know if there are other question things that i could have forgotten that topic to clarify so great job folks ci jenkin sayo not only has been security updates in the past hour also it now features the latest jideka 17 and 8 versions that have been deployed jideka 8 is two weeks old but uh we had something broken on the build process that delayed the release and thanks mark for catching yet another name change pattern for the open jideka the 17 that time so we had to change all the update process and the provisioning script and blah blah blah it's annoying they always do that when they switch they do like one ga version and then they switch the project into maintenance modes and they rename it had that before so really do you have to do that yeah and good catch mark on the jideka 11 is now featuring a four digit pattern so we will have to update all the system to be sure that it supports because it has been released but no pull request open automatically yet so that means we don't catch that new version yeah yeah oh well no one uses java.net today should be client anyway upgrade that java instance ever ever that that evolved not on the jinkers project anyway oh that's good i hadn't done that investigation i was really terrified when i saw what the what the regression was but you're saying tim it's not a commonly used thing in the jinkers project you can only use it from java 11 and we're not allowed to use java 11 until someone finishes the chip huh wonder who that is okay thanks so thanks a lot that's also a good thing it means that i'm not the only person being able to manage the packer the packer tooling mark is now seems that is stefan also so i'm not the best factor here or i assume i assume for me it's a good it's a good assertion are you working your retirement already no way there is a pull request with your name mark too late thanks a lot for this are there any other top priority topic or can we move on on the subject delay from last week let's go ahead so building docker images on infras ci release ci so as we said earlier now that we have the full virtual machine capability there is a task to do that will be to update the existing shard library the groovy library used for the pipelines to allow that capability should it replace the current emg inside the container in cuban it is or should they be complementary i don't know but there is a task to allow that the reason being we should be able to build for new architectures and new platforms such as windows container because right now our windows docker images rely on the jenkin ci images and we want to own the full stack so that will allow that part demian on that i've been using red hat enterprise linux eight for about four weeks now and they don't even deliver docker ce they do podman so so this kind of thing i i've been getting firsthand experience with it not that it changes anything we do but i there it's it's not just kubernetes where sometimes you can't do docker yes thanks so i'm sure i'm sure you can install it oh yes no no doubt you i'm just trying to stay default to experience it the way others experience it yes yeah i agreed to him i know there's a solution if i wanted it however we don't run red hats on our platform which i'm really grateful for yeah and and this is not a request that we do so only only a note that it is a it is an interesting platform it's a commercially relevant platform out in the world and so us being mindful of it isn't harmful however since red hat advertised that podman is so perfect and so secure that they should allow docker in podman which they don't so not my problem sorry for that one was cynical but i mean if you look at the aliases on a racial aid you should see alias docker equals podman literally so i hope they are good at skipping up with the docker api changes but again but that's a good point that's the reason by some people need it so it's a to-do list there is no priority on that one but that should be quite useful if anyone is interested in contributing on shared pipeline library help is welcome on that board a word about security on infrasci we start to have a lot of different jobs for different use cases on infrasci that instance was first started only for infrastuff but now we deploy previous for website for jenkin sayo we have terraform we might we need to move puppets on that part so in order to improve step by step to add another security layer we will want to split the credential and jobs on different areas that will allow to scope credential per repository to apply the least principal privilege but like i don't yeah we need to handle job dsl because right now g-cask only allows us to specify credential at the top level on the jenkin instance so now we need to find the correct directive on job dsl to say i want to load the credential from that environment variable so for us it will just move the environment variable definition from one part of the emerald to another except another we need a specific syntax is it better to just move the netlify stuff to ci.jnconcino instead uh if you have a credential on ci.jnkin sayo consider it act yeah but if we heck in a scope it to just preview deployment sites yeah i'm not sure the idea is with that one because it's not only the preview sites that would bother me it's the fact that terraform jobs can access the credential that they should not have and vice versa so it whether or not we move that won't solve the issue that we need to split the credential scopes that's a good practice in production with sensitive credentials as ours so we could split the jenkin instances there are long term things that moving all the credentials out of jenkins and connecting jenkins to a remote vault system but right now we could easily update the job dsl port and use the jenkins already available feature its configuration changes worst case we break infrastructure and we roll back so moving to remotes credential source doesn't really change anything you can still get the credentials just the same through jenkins apis yes but for instance garret started work last year using kubernetes secrets so jenkins doesn't store credentials when you request the credential it's just a placeholder and jenkins will get the credential from the kubernetes secrets so jenkins will never encrypt the credential on its jenkins so it will always be on in memory so jenkins is just used as a conduit to get retrieve the credential put it on the pipeline and then remove it i really don't think it makes much difference like sure it's not written to disc encrypted but that that's just one very minor area i think i mean you can still you can still just request just with credentials credential id and you'll just get it from kubernetes just the same yep and then on kubernetes we have to store the credential encrypted in the cloud vault we have a cloud vault for each kubernetes provider because kubernetes alone is only hashing the password in secrets but the goal is to rely on the underlying cloud like we have an azure kms right now so that will be that's a more complicated topic so that's why i said long term right now we have the ability to start scoping credentials sure we also have the reason to separate the jobs is also because we need different properties terraform jobs doesn't have the same properties as docker jobs especially in terms of the ability to to do github checks feedbacks for some jobs we don't care even though we need to have a feedbacks for public users and for some jobs we don't want so since we need these different policies per jobs separating per kind of policy even of folders instead of github organization avoids the dread configuration like i go to github configuration i change one trait and then i have to wait for the watch to apply everywhere and then two days later someone else complains because they have that new trace that they don't want so you have to split etc etc which is a maintenance hell so in no case that's also another i love a reason to split uh in the to-do list we have alibaba mirrors as well i i don't know what's the status he's mark we've we've hit our 30 minutes there but alibaba had offered to provide a mirror and we haven't yet registered it with our mirroring system okay so that's also a task to do rights right it is it is a to-do and and we had i had asked them a question about the physical location didn't get an answer but actually it doesn't matter it would be great to have one more physical location in in the u.s as an example because right now the entire u.s west coast is being served by a single a single mirror that's out of that says it's in the middle of the continent okay for the two other top-level tasks before we close i'm the best factor for both seah jen kinsayo i still need to finish writing the epic issues milestone to give you visibility so i have to extract things from my head and put them on issues some issues already exist some of the don't so i'm the best factor here so no no need um daniel fixed the matrix three errors and seah jen kinsayo entrusted thanks daniel it's not jic asked but he fixed that directly on the groovy but at least it's fixed so no emergency to rush that part and some sources still have an email to send uh is it okay for you or we have two topics to delay to next week for the rest it's to the list everyone has to go unless there is missing topic last point things one two three okay so now we have hold to fix these tasks and run thanks everyone have a good day thank you bye bye