 All right, well good. I was hoping for an intimate group. So this is great Let's get started so we've got everyone here This is the Drabalado infrastructure panel. I'm Rudy or basic Neil drum and Ryan down at the end. I'm mixologic and Tatiana's on her way, and I think everyone knows Tatiana, but she'll be the person that comes late So here's kind of our agenda. It's it's all infrastructure There's an overview of the infrastructure and You kind of thought it would be a good idea to touch on kind of like production infrastructure history just to see kind of like where it where it started the Keynote in Los Angeles kind of covered some of the history there with like how the project moved to the open source lab and stuff like that so to kind of give like a Infrastructure level like look at what that was and why it happened And then kind of continue on to like where we are today with the full-time team of staff and a panel of people that do infrastructure stuff and then also the pre-production infrastructure history, so that's Kind of the same thing like there wasn't pre-production infrastructure for a very long time and Then we built some and we have a lot more work to do there. So hoping to get some Some feedback on that and kind of plans for that and also on the production infrastructure kind of like next steps If you know, I think that's more defined the pre-production infrastructure is not as defined So once we get through that That that'll be me mostly covering that and then Drupal CI overview combination of Neil and Ryan Talking about the how Drupal CI is Drupal outorgan integration works How the dispatcher and what the dispatcher is how that works and how the test runner that's actually like putting the testing together like this is my sequel five five and PHP five six and launching the containers and managing that sort of testing part how that all works how it ties together What it looks like in the infrastructure and then once that's done There's less gifts and we kind of talk about kind of our infrastructure plan for production and Where we're going next and kind of what's what's in queue for that? so Let's get started Kind of already alluded to this, but we have come a long way Drupal's a pretty mature project now, you know, and it started in a Seemed like it started in a basement. I don't quite know where it was in Europe But it was in Europe and then it moved to the open source lab at Oregon State University About an hour and a half south of Portland And once it got there There was a lot to do like there wasn't really an architecture. The site was still not quite stable We've come a long way and like making architecture changes easier to do now configuration management We have a budget to buy servers that helps and then you know Like the the key things are its stability security and performance are kind of like that's what we want to continue building with the infrastructure so a long time ago in Europe Drupal got popular and the site went offline and like before before crowdfunding was like a buzzword thing like the community came together and funded purchasing servers and the open source lab was also kind of a new thing and provided like very subsidized hosting for open source projects and so the The servers got shipped there the site got transferred over the data got moved and we built something And there were a few of us helping not myself actually an Orion Newton who wasn't able to make it to the to the conference Neil drum Myself to some extent after kind of the initial crazy, but we need to fix fast So it's like, okay, what do we do? What does the open source lab have that we can utilize? How do we how do we move the site over and you know keep it running with volunteer kind of community support on infrastructure? So we yeah, I don't think I existed then. Oh, maybe you didn't it was Gerhard kind of like helping our data migration and Key art right and Eric Searcy was another open source lab students so there were two students Eric and Orion at the open source lab that put together this architecture like this is how we could scale Drew blood or keep it online The open source lab had a pair of load balancers that they could they could use that could be part of that and then you know We had some servers that were Purchased with that kind of crowdfunding stuff and donated by Sun micro systems and kind of built this architecture so we had four web nodes one of them had NFS on it and kind of shared the web route with the rest of the Web nodes load balancers did just around Robin load balancing Drew blood org was like really big in comparison to kind of the subsites that existed at the time So there were four database servers Drew blood orgs database was kind of prone to failure or The subsites were you know running queries interacting with each other in ways that were like drew lower would go offline if a subsite got popular or vice versa Drew blood org was getting hit subsite would go offline So there were two separate kind of database clusters in like a high availability Replication So that's that's where we kind of ended up and we were like that for a long time. It wasn't enough It was sort of like we we thought we were building that brick house or you know something out of brick and It it works, but it kind of started to fall over there. There wasn't There wasn't enough control over kind of what we're doing and Drupal is growing the amount of traffic we were handling was pretty insane Today, you know, like updates traffic alone is about 12 to 14 terabytes a month. Just serving XML updates for Drupal sites So around 2011 when the droop lot org redesign Kind of went under way the Drupal Association had contract work and Contractors come in and myself and Orion contracting through tag one Neil contacting through himself and a bunch of other people Kind of working on like adding solar to Drupal at org redesigning the site. We use this as an opportunity to kind of like Enforce a better sort of more stable Infrastructure that would be more sustainable long term. So we got our own load balancers. So we no longer reusing the OSL load balancers We got some new database hardware That kind of didn't change like that. They were Super over specced when they got purchased. So they they were still working So there were some things that like we're still working some things that weren't NFS on double double one was kind of like a weird one off like Double double one we would treat differently than the other web nodes and if it went offline if we did some upgrade or something there like there Need to reboot for like a security Issue and like NFS would go offline. We have to take the site offline and it was a big a big pain So we got two media servers that serve up the NFS and high availability So there was failover for NFS and we could just walk one out move back and forth and we also started using Jenkins to run all of the sort of like well Hudson at the time to run all of like the cron jobs and Got Drush commands and things to run the sites So we had like an interface to do that to add them In a way that didn't require like Neil to log into a server and edit the cron table or the cron tab So you still got added Get the git migration ended up happening on util because it was kind of the only server that We had to put it on Solar got added so to solar servers. So all of these things got added. We needed a way to sort of control Software and stuff. We had CF engine through the OSL but we had a very dedicated team of volunteers that were doing a lot of work and Contractors that were doing a lot of work that didn't have access to CF engine. So we started using puppet and Puppet got kind of merged in in addition to CF engine and that's been an ongoing project to sort of decouple that CF engine configuration management From our puppet management and that's I'd say like 95% complete at this point So that that's 2011 still run by volunteers and It worked for a long time like it was it was working It wasn't the best we just kind of let it let it run and like when we had time to fix things or upgrade things as volunteers we would do it but Today we have a lot more and I'm now infrastructure manager at the Drupal Association and I was able to hire Ryan and Archie who's not here to also work on infrastructure So we have you know three people working full-time now on kind of bettering the production infrastructure Pre-production infrastructure and things like Drupal CI so we kind of slowly iteratively been replacing components and And working on all of these projects to build kind of an even more stable more secure More performant Drupal at org So the load balancers now are you know, they're they're running free BST and HA proxy. It's very simple And kind of something we can use consistently across different servers and services The web nodes are all web nodes. We now have a Jenkins node that is actually a basically a web node without a Apache or Farnish or any of the web stack. It's just a web node with like the ability to run Josh commands so that's locked down and separated from the other web nodes, so like there's no Jenkins has a lot of power like as it as its user that's been our user that we run Jenkins as Can do a lot of things to the site. We wanted to have it separate so like the web nodes were just serving web traffic and Jenkins could only do things in the Jenkins node And we have some like pretty strict se linux and GR security policy On that box. It's more enforcing than it is on the web nodes so that Jenkins doesn't go crazy because it's kind of a robot so I'm trying to prevent that Yeah, but mostly it's running drush commands and you know the drush core cron and Things like that And some other changes there. So we have two new database servers database servers Got replaced like last year and those were the same ones that got purchased way way way way back and a lot of the kind of performance Improvement that we saw Andrew Blatt or was from that replacement. It's just newer. We needed newer hardware and they're super over spec'd so they're Really barely doing anything now In comparison to like how how loaded down the previous ones were so that's that's good Some more changes that we want to do or that we have done get one and get two So we have highly available get that your blood org now. We didn't have that previously which was Problematics for the same reason that like NFS on double double one was problematic if we needed to upgrade something on util Get and reboot, you know get would go offline We're also using private IPs in a lot of places one of the things that the OSL still provides us is a lot of networking support so we don't have We have public and private IPs on most of our production infrastructure and one of the things we're trying to do is get away from Using public IPs on anything really about the load balancers and some gateway servers or something something like that to lock those down And just network level security for the stack So we've been moving things like solar media one And other pieces of infrastructure to use private IPs now And then eventually like remove those public IPs completely so we can start locking down the stack So that's kind of what we're going for is stability like we needed it we needed it stable we can do that now Performance like we we don't want to have to deal with performance things like just let them The site gets more More load on the site means we don't have to do anything The you know the keynote the trees note in the past the Drupal cons has been like crazy Like Narayan and I in a room together trying to fix the database servers or fix problems the trees note This year was I was actually like eating breakfast like I didn't have to worry about it. It was Fantastic and then security finally look we've done a lot of things. There's still a lot to do with security But we're we're running se linux everywhere we're enforcing policy everywhere productions running GR security like hardened kernels on Well pretty much everything I'm not gonna tell you which thing isn't running it right now, but I Know Gotta take those private IP or public IPs offline and that'll help So tomorrow we want I kind of already alluded to this right as These gateway servers down here doing routing so like we need to do routing running snort so we can do you know Analysis of what is going out what's going in who's who's doing what and for things that need One-to-one that's like get right now is kind of it's easier to do a one-to-one that for the get stuff So it'll still have sort of a public IP we can do that with the gateway servers and and rip everything off else off the public network, which will be huge for a security Just from the network layer, and that's kind of been a key. I mean well, I was gonna say where did you till go? Yeah, that's the next thing so util is another It it was it served a really good purpose when it was just volunteers and we didn't have Servers to run things on virtualization wasn't really a thing yet We now have VM 1.drupal.org which is a really beefy machine that we have for virtualizing things that are Not Not critical production services. They can go down. We don't need to care as much about their their uptime So mailman for the mailing list Does anyone know like does anyone use those here you using that security team uses them? There's a few lists that are still active So we do maintain mailman and post fix and some some mailing stuff there So being able to move that off util throw on the virtual machine Puppet same thing it's running on a OSL virtual machine right now We would like to move that onto our our infrastructure and then Jenkins also is running on util right now So all the mailman and pup and Jenkins were both running on util puppet was not but now we have a Machine that we can basically put them all on and that gets kind of that That gives us a place to do things like oh, maybe we want to run our own LDAP server someday We can put it on VM 1 and that's that's sort of the direction that we're trying to go and that I see us going and Any comments questions on kind of is it this happy? Will it be this I don't know So a shell server would probably be The question was if you can still use util will probably scrap it and send it to you So yeah, you can you can keep using it How many how many years old is that Right right so Depending on so going back a slide so depending on what the gateway servers run We're thinking pfSense there because it's pretty standard or it could be free BSD one of the two We would either set up a shell like the ability to do shell server stuff there or Another VM that's like shell dot triple at or Where like it's a VM. It's not really critical production infrastructure, but you know 99% of the time it'll be up Kind of thing so but that is a thing I can add to this little image, but yeah, no, that's a great. Thank you. That's that's good Any other like thoughts questions? Yeah, so a question was Jenkins security. How do we secure it? That is actually running on local host only so it's only listening on local host on util so to use it There's a page somewhere on triple at work that kind of has the details, but it's an SSH tunnel to util Like port forwarding that 8080 port that it's running on to your local machine Then you log in and it's linked up with the kind of LDAP that we're running now So your LDAP account if you have one You use that to log into Jenkins and then from there we use like project-based security in Jenkins So that we can look certain people have access to certain projects certain people don't Yes, so the SSH command there would be a little tricky, but Yeah, so that the first question was you know, how do we how do we get to Jenkins when it's on a private network? and you have to log in through gateway one and One suggestion is well on a private network. Do we need to care about Jenkins running on the private IP? probably not That that would be an easy kind of solution there where like if if you're logged into if you're doing a single forward from Gateway one to like Jenkins dot drip blood or Or that drip a lot back Yeah, it would just be a single forward in that case. That's that's a good idea But haven't yeah, I haven't thought too much about that piece. I mean do you guys have any thoughts No Yeah, that works. I mean like we wouldn't really need to have it only listening on local host once You can't get to it. Anyways. Yeah Yeah, yeah Yeah, yeah, and a VPN could be an option maybe another way we could You know VPN certain people only had access to certain internal IPs sort of thing and we could do it that way too SSH is definitely easier on my end to set up But nothing wrong with VPN so that could be like a tomorrow tomorrow Thing so any other thoughts or questions before we move on to pre-production cool So pre-production infrastructure history It's kind of I just gonna sorry did you want to talk about the CDNs around this cloud at all? because as far as like updates and Static one and static two Yes, well, I will touch on that in the kind of the What's next and what's on our pipeline because it's kind of it's parts of that I covered there, but Yeah, actually I didn't cover static one static two so though. Yeah, that's some Updates and FTP traffic. So FTP being triple downloads over HTTP. It's just a little confusing But it's FTP dot triple org that was hosted by the open source lab on their FTP mirrors and we had the hardware and had some constraints with their system where Like pushing a release to the FTP infrastructure would be you know pretty instant like you'd get packaged a Job would run and sink it to the mirror and then their mirrors would have to sink out to the like other two mirrors and that process Transferring from their mirror system to the other two like Chicago and New York mirrors would take up to like 45 minutes to an hour sometimes and sometimes it broke and That was becoming problematic with like security updates security releases that were Announced packaged pushed out and then taking an hour to get actually get Across the mirrors where you know, some people could see that like triple seven dot three seven got released because they're hitting the Oregon mirror because they're on the West Coast someone in Chicago or New York couldn't get it So we move that to Static one static two. Those are just like web nodes except they don't run PHP. They just run engine X. They serve static files That that's been working very well those are linked to fastly on the front end so fastly is actually doing all the CDN like Stuff for those they send up a you know 365 day like catch me for that long for all the files and Then we do some smart caching purging stuff that Neil wrote For updates traffic for packaging When a new release comes out purge the CDN if it had cash They 404 or something like that for that release and do a soft fetch in the background and and serve it up and that's taken Around updates updates is about 12 terabytes a month and FTP is around like four to six depending on on the month And our origins now like don't static one static to hardly see any traffic It's only only new releases and they only get downloaded once from from the origin which has been fantastic I've been very happy with that set up so far and I It's it helps everyone because it's faster downloads around the world for people Faster security updates because it's near instant like purge on fastly if there's a security update for something So that's that was another kind of project big one that we had on our roadmap, so Yeah, and we're also in the near future going to start using the static servers for our static content like the Drupal file structure So we can put that a different domain and that'll be a little bit more secure Yeah, and there's a I believe a dev server up right now with kind of a it links to Drupal content org Which is a new domain and that's that's routed through fastly kind of the same way as all this the updates in FTP work so that files directory for Drupal org and potentially for the subsites does Drupal content Serves it up and there's no like, you know cross-site Exploit, yeah less less places to cross-site scripting inject stuff But yeah, we just need to well We need to figure out the the usual things like advanced aggregator expects to get hit Drupal or the files Image styles expect to hit Drupal if things don't exist So hopefully this well, you know it always changes, but that's kind of the priorities right now So moving on the pre-production of structure history Oh that slide got messed up. Well, this slide doesn't really matter. It's kind of gone ad hoc to more standardized and more Automated thanks So very very long time ago right after the OSL kind of took on hosting Drupal.org There need to be a way to not just edit files in production, which was kind of what happened a very very very long time ago There was nowhere to stage. There's nowhere to develop things So two VMs got created. There was a scratch VM And that came first. It was like just you know, let's take Drupal.org like dump database put it somewhere and like do the edits that worked and Then it was kind of like well, you know more more people want to work on Drupal.org Okay, let's let's make it like a server and let's call it staging VM and let's do deb there Not not confusing at all This lasted much much longer than it should have So this is you know, this is kind of how it started so there there was this work People were doing work here. They existed There was a place to kind of test things but there wasn't a ton of change happening at Drupal.org It was all I know you weren't Were you contributing when this was? Yeah, it's a volunteer mostly working on api.drupal.org in this era Do we still use this for the redesign yeah, the big projects like the redesign Drupal.org redesign and get migration and The Drupal 7 upgrade like we would set up an infrastructure. We needed it for each project And that's where this came from actually like this is a bit better each time and now we have This is kind of what came out of the redesign back in 2011 was this Like we have a bunch of contractors working on different parts of the site. They all need separate dev environments What do we do and it's like oh, we just started using Jenkins maybe it can do stuff and It it does you just give it bash and it does stuff So we had a bunch of Jenkins jobs that were just ended up turning into like really long bash scripts that would kind of make Dev environment like for solar work or a dev environment for like performance work or a dev environment for you know Whatever we were doing on the redesign with the different teams. So there were multiple dev environments and it worked For a while and then redesign launched and all the all the contractors went away and Neil remained and really like the system the sort of like ideas of that system Remain in place today, but the bash scripts are all managed and get the like adding new sites is a very like very clean process and it's worked very very well and it's scaled up significantly since then so We moved the Pre-production infrastructure has moved quite a bit. It was on the open-source lab Zen server that they had so they had like Zen set up staging VM and scratch VM were there and then they open-source lab, you know, we're still Volunteering time volunteers like they had virtual machines. They were like, okay, we're going to Gennady, which is a Google project using KVM They moved there and then We had money to buy VM one. So we moved things to VM one running open stack and They were there until May or April I don't remember when I moved things but yeah may one of those. Yeah, it was a few months ago we decided hey EC2 would make this a lot easier and and faster and Maybe it'll give us some more flexibility with like snapshotting discs and you know not having to manage open stack and All those things so we kind of like re-architected slightly how the how the pre-production worked But it mostly moved it off to EC2. So now we have EC2 with a separate domain to prevent cross-site scripting stuff dev dev 1 dev DB one dev solar one So there's a solar server now for dev work that didn't exist before staging now has All integrated staging dev dev one DB solar and get Which didn't exist before either. There was like a separate sort of Site for get work. Yeah, it was one of those one ops. It was like the content I didn't sometimes worked Yeah, so that and that was another one where like the contracting work started and then They stopped and it just kind of was there But we weren't really using it or keeping up to date So that got implemented and then after that we needed Another part of this workflow Which was integration. So now we have a place where we can we can sort of Pre-stage and integrate like some work test it make sure it works continue working on other things So we're not locked on kind of that workflow process and Neil I mean you can probably explain that better than I can It was kind of how that Not but Brian can explain it better than I can Which is a panel people Tim you know about this, right? Tatiana you you wanted me to do it, right? You know about it Integration Yeah integration is another copy of staging. No, we're not So integration is a copy of production where we can Integrate different work from different developers and see how it affects each other because before that we only had them environments And then staging and then production and stages like the very last step if you get something staging it will go on production there's no way to like Easily roll it back essentially. So for big changes like we do right now for gaining groups for example We need a place to play with it and see if it won't break things before we actually are ready to deploy it Yeah, we're doing things like It's probably what you're gonna say like installing Xh prof on all the pre-production infrastructure and making sure all the fun developer tools are ready to go Yeah, because Integration being actually on bare metal and not on a VM we do performance tests there as well So we can get you know solid baselines and make sure that this new module stack that we're going to deploy isn't going to take down Dube.org Yeah, it'll probably work with your IDE out of the box if if that's The way you work Yeah, so that that's integration and that's actually This is going to be confusing that is actually running on VM one right now and the reason for that was to Have a more stable sort of environment to do those Xh prop tests and know that like this is a this system Is dedicated for this and there's not some other EC2 instance running on on that Actual node that's affecting the performance or there you know might run Xh prop test With one thing and then run it again, and it's like oh it's drastically different But that's because the RDS instance or something on on Amazon was Bogged down or if there was some snapshot happening somewhere or is some other outside factor affecting that so that's there now I don't know if it will stay there Forever a world ago. We have we have some other like it could go on util which might be better because VM one is very bored right now like it's it's meant to run many virtual machines and It's really just running one thing right now, so it has just to throw some hardware specs VM one has 32 cores and 120 gig of RAM and like Raid 10 15k discs and it's running like one site that we run some tests on right now So it's kind of underutilized So that might move and This this will change and we haven't really known how to change it from here This is still sort of that same architecture that we put together back in the redesign days And it it works for us like at the Drupal Association, but we've been trying to figure out how to make it better for contributors and volunteers so Tatiana So they invited me because someone needed to complain about the environment so I can do that Two biggest historical problems with the dev environments are that they are firstly not complete copy of production So some things are missing completely like it Sorry We did rehearse that So some things like get a missing essentially if you want to develop a new feature for our good thing You can't use dev environments for that so you can do it sort of locally then deploy and hope it won't break Anything which is not ideal and then things like solar We have them on dev environments, but it's sort of very tricky and so not easy to connect your dev environment So you were able to actually work with it and the second big problem is mostly the time it takes to sanitize database and Essentially deploying your dev environment. It's getting a little bit better. It used to be three hours now It's one hour, but one hour is also not really Ideal so we are supposed to discuss how to fix it because we don't have answer yet Right today, right Tell me when it changed the slide No So ideas here like I if anyone has any suggestions on how to do that I mean, it's pie in the sky thing We've had some ideas internally, but we don't really know like how how important our dev environments to To people like what what could we use to make them better? Is it is it Docker? Is it is it in Amazon? We have an AMI that gets built that has all these things integrated into like a single VM is it Is it solvable or is it like is the current architecture just kind of what we need to live with and and use? Brandon oh, there's a mic right that'd be great time to use the mic Yeah, for for solar I would highly recommend running it in cloud mode and using the rest apis It makes it so easy to just pop a toss another core up And and you know destroy core that you're not even what it is Okay, it makes it trivial. Okay, and that I mean would we run that? like in the kind of current architecture would we run that on on Like a VM that's a solar cloud VM, or do we run multiple VMs with solar? I would just I would just run one, okay for for all the dev environments and Makes it you know makes it a lot easier to you just you know you tune the whole thing once and then you just Yeah, that makes sense because we're only actively developing solar unlike at most two development environments at a time Read only for everything else Cool, so that would just be a separate core There'd be one separate core for each dev that needed anything to be isolated from everything Quick question, Neil about like if someone wanted to help with packaging or how packaging works Is that something they can do on the dev environments? Is that something that needs Jenkins to? packaging is a it's a drosh command and Because we took the get away from the web nodes file system That means you don't need it on dev either, so you don't need to get for packaging it does everything It it checks out it clones the package from the get servers, so Yeah, it's used to have to know the drosh command and You know Expect or know how it how it runs Cool, I'm gonna switch images here Yeah, very distracting Holly wanted her cat. Well. Yeah, we need to explain this. Oh, okay Go go to essentially they gave me a slide and said I can do whatever I want and our executive directors passing by She really wanted to see her cat and the presentation, so that's how this cat just and she didn't even show up to our presentation That's depressing She's seen her cat before So you mentioned the there's the time lag though three hours down to one hour time delay What is do you know where the bottleneck is there? What's causing it to take that long to happen? it's basically Our sink the database dump over from Where it is and loading it into my SQL so yeah some sort of What's the best parallel Database restore thing nowadays Okay I was just wondering you know what it would it save time to have a nightly sanitize Yeah, yeah, or the like pre-baked. Yeah, AMI or something like that. Yeah. Yeah, so we so we sanitize nightly like Nightly there's a snapshot that happens from like production database slave and that gets moved over to DB util Which is like a very isolated unused in production thing that only does sanitization right now so that it goes from like nightly it goes from production to DB util and then DB util runs a like a staging snapshot sanitization And then also runs like a whitelist sanitization for for dev environments and that gets sanitized and packaged up and exported to an r-sync module on DB util that the dev server can Can grab So the dev server is really like when we deploy dev environment It is r-syncing that from DB util and restoring it into a database and that just the amount of content Like data that's there to restore takes with my sequel single thread, you know my sequel dump import essentially about an hour Okay. Yeah, so yeah, it sounds to me like you know something like an image where you just generate the image periodically And then seems like spinning up from an image is gonna be faster than trying to r-sync Okay, yeah, I definitely would or what about using stupid ZFS tricks Well, that's that's come up. So Archie who's not here. I'll speak for him though Was looking into doing some like butterfes things like that like a butterfes snapshot like sin receive type thing and almost got it working, but it's it's butterfes and it would probably work better with ZFS and You know in the production infrastructure In the production infrastructure We're using free BSD now in places and we're trying to get sort of like we've upgraded from since us five to since us six The next upgrade that we're seeing for that is to go to free BSD And then we'll have CFS support and we could maybe do something like that where we the snapshot is actually there's still that like Manual process of importing the data in the single thread, but then we snap it and we can send that snapshot around well or or just Have those be origin points for clone file systems That's a great idea too. Yeah, okay. Thanks more input. This is this is great Docker yay-nay I Know I mean I mean really like for dev environments doesn't make sense for a host of dev environment to use that Why why does it make sense? Mike it? I think that's why I'm a dev ops because I'm lazy So yeah, because it's it's basically ephemeral so all of the things that you want to Just touch and go you can just throw them away later so it's not trash that you leave on your shop and That's why it should be a very good Option to have like containers that you can just use for development Okay, and for other testing not not the Drupal core testing but for other kind of testing and what I mean if we had like a Docker our own like registry where we Like is that kind of how it would work where we like snapshot like we have a doctor image That's that latest database. I don't know Are you guys going to have the Registry that we talked about some where in the past still possible still yeah still possible It would you know we have static one a stack to now for doing that sort of thing You actually could can as long as you don't have keys and stuff. That's for The Drupal situation you can use the the Docker hub right but when you're starting to have images That are related only to the work that you do internally. Yes probably the registry makes more sense and Mezos but that's another conversation. I would move all of that to mezos What all of those machines to all of those machines would be just one huge grid Gotcha. Yeah, yeah, we do try to have You know the reason we're doing a whitelist database sanitization that means if a new table gets added We don't put it in the database for Dev sites until it's explicitly added so As far as we know everything's secure and we'll stay secure as far as like disclosing information through our Through our snapshots, but we're not quite We're not quite ready to say that like absolutely. We got everything because that's We want to be able to pull things if we are like, oh that that person's email addresses in the We're gripping for all the email addresses, but well, I don't know the amounts of work that gives to create a Registry is not so big, but depends on the time that's available to do that So it all depends on time that is free for that As me as I'm I'm a volunteer. It's exactly the same thing that the time for instance in that That's why I asked about you till if you guys have the things exactly on the same place I don't have to waste more time to figure out. Well How am I going to do do whatever I was doing? before Because it's in the same place so that time is not wasted and the same thing for the registry If you guys have time to build the registry you just do that and then it's it's saying that you guys have for a long time But it only makes sense if you're going to use Docker So Yeah, it's kind of circle. Okay, great Cool Well, thanks next up Drupal CI Yeah, in the last few months we've gone from a Drupal 8x celerate core or Drupal 8x celerate sprint important to kind of pull together all the different disparate pieces of the Drupal CI architecture and move it forward to where it's now deployed in production and Pretty much next week Very soon. We're going to be shutting off pit to pit for I'm using Drupal CI exclusively and I just want to give a real quick overview of what That means and as you can see from the slide, there's lots of clouds It's Drupal.org is currently communicating with a Our CI our Jenkins CI instance, which is our dispatcher and that Spins up EC2 bots that then run all the tests and those those tests are running inside of Docker containers on those bots and so It's really flexible in that we're able to Kind of do things on demand and not have to maintain the bots anymore because if you go to go to the next slide Let's talk a little bit about More in detail. So on the Drupal.org UI side of things It's creating. Oh, is this yours, Neil? Sure. Yeah, this was yours. Sorry. Yeah So Yeah, I reused the project issue file test module So Wouldn't have to redo all the rules of like patch has to be named this but not named this and has to be in these This yes being these conditions but Yeah, the new stuff. It's a little bit more modern. We're using it a new entity and Entities are actually too slow for the individual mine item. That's the job result like this one test out of How many tests we have like five thousand twelve thousand? That's in my next slide. Yeah That's old-school Table that we use and Yeah, more drush commands To kind of push things back and forth into the Jenkins API and pull out the Jenkins API And Yeah, we added daily tests which we didn't have before so that's another Drush command that we run once a day Yeah, cool So yeah, the Once it hits the Jenkins dispatcher We're using the Jenkins ec2 plug-in which it's kind of being renamed to the cloud plug-in soon But it's there it it auto-scales ec2 Amis and so every time a job comes into Jenkins like we need a new test It looks for an executor if an executor doesn't exist It'll spin up a new ec2 instance and you tell it what jmi you want it to spin up and which job you want to run on that Executor and so that's allowing us to Like we haven't had this normally during a Drupal con or during a sprint or during you know Sometime where there's a high level of activity We're always on the hook to spin up new bots and get them all configured and get them all ready to go And then when it's over we go terrible down so we're not paying for them And this has actually allowed the system to just scale based on demand and just We have more bots then we have less bots and sometimes at three in the morning There's no bots and so and because we're able to do this using spot instances We're not paying nearly as much for executing a lot more tests and so Because we're using Jenkins, it's already got a really robust API built into it for all of its results It's a JSON API. So I made it relatively easy for Neil to consume all that data so and So far. Yeah, there's been 17,500 tests as of yesterday that we've ran in the last two and a half months that's 17,500 tests in two and a half months So it's it's it's moving pretty quick and then as soon as we flip on everything for contrib we're probably expecting to see a lot more and In the future, we'd like to be able to get it to where we're using something like Docker compose so that The developers that are running their tests are able to provide a YAML file to define the environment They wanted to run in because a lot of the things we're seeing now is that people course like oh everything works great Except if I put in front of behind varnish, then the header negotiation doesn't seem to work, right? But we don't have a testing infrastructure that can support that right now. So Those sorts of things There's so many environment dependencies, you know, if if you're running search and you're running solar We don't have anything testing that right now So how do we know that our solar integration works with search API on Drupal core? So we want to be able to get to where we can have, you know Different front caching different, you know, are you using a PC or using a PC you are using op-cache or not op-cache Or do you have certain peckle? Modules installed or not so that would allow the developers to define the environment that they want to test in And so that's kind of the direction we'd like to move in so Next slide yes And then the the test runner itself it's When the when the test gets to the EC2 Test runner it's basically got four main steps. It's happening. It it sets up the containers and pulls down everything that it needs to Depending on whether it's my sequel test or a PHP test. There's two containers There's the executor container and the in the service container that Work in conjunction with each other it prepares the code base It gets whatever patches it needs and it checks out whatever things it needs to clone in and then it kicks off Whatever jobs need to be ran and this in this case It's the simple test jobs run tests But we'd like to be able to add additional jobs in the future and it's flexible enough now that we can do that where we can add Oh, we want to run an xh prop performance comparisons or we want to run code coverage and so we'll be able to do all that using the Test runner infrastructure finally the last step is just post processing whatever output comes out of those tests because right now We're turning them into XML so that Jenkins could consume them properly so that it feeds the API all the way back down The list and that's pretty much how they work real high-level and so Any any questions or comments about triple ci stuff? I know it's been Been running. I don't know if anyone here is actively involved or wants to be and kind of what's going on there, but Yeah, there's no comments. I'll move on to the plan review. All right So we put together a plan kind of for the beginning of the year and on things We want to continue to work on for the infrastructure and we've completed quite a few of them so far Since the year is pretty pretty far gone So there was the FTP downloads migration away from the FTP mirrors Moving get to high-level cluster Unblocking like D8 on triple ci work And then decoupling from man-sheltering services So we've we're basically done with CF engine and centralized logging that OSL is providing and we've got our own you know puppet our own modules that Do kind of the full stack instead of this mish mash of CF engine and and puppet and centralized logging We now have log hosts with our syslog running All of our servers and we have central place to look at our logs, which which has been great We're continuing the puppet thing. There's three servers left that are being like services are being migrated off of them So we're not going to actually upgrade them They're they're being kind of shifted around internally. Those are DB3 which like DB DB five and six have enough capacity for that so the The sites for groups, which is actually the next kind of how we're going to manage the groups migration update thing That's going to be the only thing left there QA is going to get shut down That's the other the other site that's running there. So no point in upgrading DB3 everything will be on DB5 DB6 You till the same thing you till still has some CF engine Stuff there and it's still running sent us five It the services running there now are just mail managed Jenkins. Those will get migrated off in the virtual machine So just kind of those projects will cause this thing to kind of get finished. So it's in progress, but it's it's nearly complete decoupling for managed hosting services We still have LDAP and DNS That are in progress We have puppet doing some user management and pre-production, but not in production yet So that's probably the direction will go but haven't kind of finalized that decision yet and DNS we Don't have a huge urgency to move it, but we have Route 53 in AWS since we have the pre-production stuff there configured and like as it basically as a DNS slave right now, but it's not active for the actual like DNS updates. So that's kind of in the in progress Kind of want to know how much that's going to cost before we cut the switch on that and revamping pre-production you know, that's We've made progress there, but there's still as we said like more to do and then also improve infrastructure documentation we have a bit bucket infrastructure wiki that's there that has some documentation but not everything and Pages on drip.org that have some documentation but not completely up-to-date So there's there's stuff that's been done there, but it's not finished and then in the queue private network is the big next thing That's a lot of work there's you know that will require like time in the data center re-networking things and coordinating with our hosting provider at the open source lab Figuring that out so that that's mostly What's going to be next and Then there's some things that we haven't really fleshed out yet, which are like SMP SMTP relays OSL provides them now Do we continue using them or not? I I don't know yet Backups are another thing like they're working They're kind of lower priority. So we haven't gotten to them yet the next the next thing there is the CDM migration So we've mentioned Drupal content org Moving to fastly and using that as like another place for static files. So that users can't upload Exploits essentially on the drip blood org domain. Yeah, well, they can but they won't do anything. Yeah, okay well Yeah, okay. Yeah, there's still exploits. Yeah, just don't do anything And then moving to put organ subsites to a new CDN, which is fastly. So we have edgecasts right now Hosting or fronting drip blood org and all the subsites It would be great to get kind of we kind of got our feet wet with fastly Doing like static file hosting stuff and we're very impressed with them But still had a contract to fulfill with edgecasts. So once that contract is up We can move those sites to fastly and start doing cool things with fastly because it gives us the kind of Custom VCL we can do and we might be able to start doing more like authenticated user caching and fast purging and things like that that'll let us cache more of the site because right now it's a very it's basically caching the triple content or content and things like CSS and in JavaScript and and stuff like that But no actual like user pages or doing things like edge hide includes that fastly gives us the option to do so That's that's what's coming up next And then like we'll be sprinting Friday. I don't know what on myself, but They're you know, I'm always there You want to work on infrastructure things? That's a great time to just see if you want to get involved in something Or just discuss. Yeah, there's triple ci stuff on all levels of that stack and yes Yeah, in triple ci will be like that's right and you'll mostly and I also know how it works But like if you want to get involved like they're the ones to talk to Awesome So yeah triple ci sprinting would be fantastic because there's there's more work to do there and there's more work to do an infrastructure, but Probably more of a discussion than a sprint at this point Awesome So yeah, evaluate the session. I hope everyone enjoyed it There's time for questions if anyone has them We're at time right now, but we're at time. There's not time for questions. Well, oh Yeah, all right here so Thank you So in general you're moving away from the LDAP just because of the administration overhead We haven't moved off LDAP yet Where we Well, we want to run around the LDAP we we still aren't sure about LDAP We might run around LDAP and that would you know live on that VM kind of infrastructure and we could do that Or we could use the the puppet management there There's some overhead there and it's kind of you know, we kind of need to pick a system and stick with it and Puppet user management is not great terrible actual users. So Yeah, I it's Unknown right now The the reason we want to move away from the OSL LDAP is because we're in the same tree as all the other OSL LDAP accounts so and That's just confusing. I'm the only one that really has access to enter LDAP accounts right now, which is like I have to be the user guy, which isn't good So if we do choose to do LDAP, it would be our own LDAP we'd have to kind of pull our entries out of that LDAP tree and Move them and then update everything and do that. So there there'd be some work there but it's definitely possible and and still an option and When we moved our pre-production to AWS, we were like well We could set up LDAP out here or let's just use the puppet stuff for now. So that was kind of an interim solution Cool. Thanks