 Hey, welcome. This is the introduction to Wikimedia Cloud Services staff. I'm Maldivis, this is Brian and Chase, and they work with me on the Cloud Services team at the Wikimedia Foundation. And we work on infrastructure and platforms to service things for the Wikimedia movement projects. And it's built by us and Andrew, who's on our team, and a lot more, while I get a little worse from the community. So what is Cloud Services? Basically, we are a whole thinking platform, which means that if you have software that has anything to do with the Wikimedia movement, we have computing resources for the university. On top of that, we provide a lot more abstractions to make it easy. We have a web service, if you have an API that you want to run somewhere, you have a little bot, you have a visualization that you don't know where to go. You write a query and you want to just run it against our public databases. We have tools and services that help abstract all those things, and that's kind of what the second thing there talks about. And then all of these platforms and services that we offer, we offer technical and community support. There's a documentation, talking about IRC, people helping them, tracking bugs and responding on a calculator, which is the platform that we use for tracking feature requests and bugs, and responding there. That's all the technical and community support stuff that we do. One thing that I want to add is that this is super big definitely, you can ask questions at any time, so be fine, and you can stop me if I'm using any words that are charging at you and ask. There's no questions that are not meant to be asked except why you overcloud me. Yeah, so these are all the products that we offer and started Cloud VBS, which is the conditional infrastructure that we have. It is an OpenStack-based hosting platform and OpenStack basically is this big piece of cloud software that helps you cluster a bunch of different servers together and offer virtual machines, which are lightweight computers or servers that you log into and install and run things on. So that's the underlying infrastructure on top of which we run all of the other stuff there. Toolboard is the next thing in which we run tools, and tools could be the things that I talked about before, which are a web application that you wrote, a bot, or a job that you want to run every two hours, every minute. Anything like that, you just don't want to handle the underlying server, don't want to install stuff, you just put it in, you write tool, you put it on, you post it on Toolforge and we manage that stuff. Quarry and Paws are things that actually run on top of either VBS or Toolforge, and both of them are GUI graphical interface tools that have quarry servers that run SQL queries on the web, the single web interface. And it runs it against the wiki replicas stuff that's over there, which is the public replicas of our production database servers, all the media wiki databases are there. And Paws is kind of similar to quarry in that it's also a graphical interface, but it's Jupyter Logo. How many of you know about Jupyter Logo? It's basically you can write Python in a graphical interface, so you just maybe can show it later at the end. You just log on to this website and it gives you a little page, a little terminal, where you just write Python and it compiles it for you. And you can also access our databases and stuff through it. The wiki replicas, ToolsDB and DOMS are all data services that we offer. They're kind of one layer beneath everything else. And we have, I think, more detailed architecture-ish diagram later that you can cover in more detail above. And ToolsDB allows people to store data in a MySQL data store. So MySQL Database is a service. And DOMS is all of the public media DOMS that you can get from DOMS or wiki media.org that we can allow you to access through any of the services that you use in our platform. There's some cool stats around there. Just to show off how much of stuff is used in that these are all the OpenStack projects down here on the left. To the team projects in 750 instances, those are the CloudBBS projects. And Toolforge hosts about 1,413 tools. And there are 1,700 maintainers. And these stats I think are ground last month. So those are from a little earlier. And 24.67% is the number of edits that bots and services that run on our infrastructure may be to wiki media wikis. And that's the amount of 3.8 billion API requests, similar by bots and services that run on our infrastructure to the wikis. You've maybe not heard of the word Cloud Services before, but maybe you've already used it in some way or the other. So this tool here, MonoMetal, it is one of the wiki loves MonoMens projects with a bunch of other projects that I'm not naming. And it basically allows you to explore MonoMens by location. And it's cool that it's hosted on the Toolforge.com. This is wikiSource.export. If you use wikiSource, they have a tool that runs there that lets you explore wikiSource's book to an EPUB or any other mobile reader format. And that runs on toolstool. This is outreach dashboard. And I think this is its own little VPS project and all of the program events for the outreach folks that's having here. Here it's just a wikiPia offline project. All of their, they generate the ZIN files that basically get used for their time site like apps and all the things that they have for making the offline wikiPia available. All the generation from the dumps happens on our end. And I think it is a tool for the end of the offline world. And this is the service that I covered before. It's Quarry. And you can see here, it's just, you write as you well in a graphic reader page. And it spits out the results as a table right beneath this, which it is not covered in the screenshot. You may have heard of any of these tools. A lot of other things that you may have heard of today like OARS, which is Aaron's AI project. All of those things were once hosted on labs or still like our... VPS projects will provide like staging environments for those projects even though even if they have graduated on to production. Yeah, and Brian is going to cover a bit more about how the team performed and what the industry is. Yeah, I'll ramble for a while. So, you heard, audio would have to put a dollar in this storage art for saving labs. So, this environment that we're talking about cloud services was until April of this year called labs. And the thing that we're calling tool forage now was until April of this year called tool labs. You may start to see that we had a little naming problem. The team was also called labs. So the team was called labs, the environment was called labs. One of the things running it was called tool labs. And nobody knew what the word labs really meant other than it sounds either depending on what kind of person you are. It either sounds sciencey and cool or like mad experiments. And over time, you know, things started out kind of in the mad experiment phase. So all the way back in 2005, Wikimedia Germany, Wikimedia Deutschland set up an environment that they called tool server. And this was a place for hosting tools, which whenever I say it, see the big air quotes around it. Tools is kind of a generic term in the movement for any piece of software that does anything that was replaying anybody and runs anywhere. But specifically, the tool server was set up to help people who were running some of the early bots and some of the early web services that were helping augment what MediaWiki can do. We all know that there's limitations in MediaWiki. There's only so many things that you can get done inside it. But we have more complex workflows, especially social workflows that happen on the Wikis. And people found they needed to build extra things to help make lists of horrible things that needed to be cleaned up or make lists of good things that needed to be promoted or look around for gaps in coverage in some topical areas. So that's kind of where tools started. And Wikimedia Deutschland folks took care of that environment for a long time. From 2005 all the way up there into 2013. That was a volunteer run project that was funded in various summary ways. Jump forward a bit. We jumped forward a bit to 2011. The Wikimedia Foundation, Ryan Lane, thank you. Too many Ryan's. Ryan Lane was an operations engineer at the Wikimedia Foundation. And he was into this software project called OpenStack that allowed you to take a pile of computers and turn them into this virtual cloud thing. Kind of like Amazon AWS hosting or Rackspace or a whole lot of other providers that are available in the world today. So he came in and pitched to Mark, Bersma and Eric Muller that we should set up a cloud environment at the foundation. And that this environment should specifically be used to set up a replica of production that we could allow volunteer system administrators into. And give them full-room privileges and let them mess around and help build the puppet manifests. The software that we use to automate software employees to help build that project and system out. And get elevated rights that we would have difficulty giving them on the actual production cluster. But since this was kind of a side testing environment and hopefully would be easy to recreate multiple projects that did almost the same thing. And let people tweak things in different ways. We should build that out. So in 2011 they started building that, started with just Ryan in the beginning. And then pretty soon after that, Andrew Bogot, who is part of the current cloud services team was hired to assist Ryan. And then things went on and it kept getting better and better. And in 2013, the tool server folks were having some growing pains issues. Their hardware was getting kind of old and they were having some issues keeping their maintainer community vibrant. Keeping the volunteers that were keeping everything under the hood running. Keeping them from getting burned out and active and supported. And so they started talking to the foundation probably actually in 2012. They started talking to the foundation, but in 2013 we finally did something about it. Started talking to the foundation about the next generation of the tool server. And could it have more resources? Could it have some dedicated support? Were there other some other things that you could do? And so what happened in 2013 was that we realized that we had this labs environment. And it was up and it was running this OpenStack Cloudy. And it was being useful. And so we decided to found a project inside our OpenStack environment to become the replacement for tool server. And Corrin, Mark Andre, was hired around that time to be part of this building out the new tool labs thing. A whole lot of cool stuff happened that around 2014-ish, I think 2014-ish, Yuvie Panda came in from another team. He was working on the Android team at the time building the Android mobile. That project kind of got to its conclusion at that point. And he came over and became part of the tool labs group. Inside the larger labs team, check building things out. And we kept adding more resources. In 2016, Chase became a dedicated manager of the team. And it kind of broke out as not just a couple of free people in the TechOps team, but actually became an official team inside TechOps with Chase as the HR manager. And also in 2016, I convinced my boss at the time, Toby Negrant, to let me jump ship out of my role as a manager of the reading infrastructure team and move over into the community tech team specifically so that I can work with the tool labs developer community and start to try to be kind of a developer liaison and a resource to build new things for them. And then in 2017, in late 2016, Chase and I were talking at a conference about a lot of the challenges that he was seeing from inside the team and I was seeing as somebody who was trying to help promote the tool labs forward. Yeah, I can't even see the wrong names there, that's awesome. So we started talking about the idea of taking this team that he was leading inside the technical operations group and this team of one that I had formed for myself inside community tech and merging them together and pitching to solve several problems. And one of the problems that we really hit hard on in the internal pitches that we gave to the foundation was this naming problem, that labs and tool labs didn't mean anything to anybody. We actually have a page on wiki tech called labs labs labs that he started years ago that lists the many ways that the term labs was overloaded and that when somebody just said labs without qualifying it as witch labs, confusion ensued. So that was part of what we pitched. Another part of what we pitched was some of those numbers that were back on the slide that that Lu showed you and they're up on the poster over there behind Chase that what the real impact of these volunteer developed software projects was for the overall movement, right? When you see that number, what is it? 24.6%, 24.6%. So that was over a 90 day window across all projects, all 813 wikis. A quarter of the edits were made by software and volunteers ran inside what at that time was Chase's environment. There's some footnote numbers there and I don't, you know, citation needed. I actually do have the citations. I can dig them up. But 50% of the edits, a little bit over 50% of the edits to wiki data during that same time window were made from labs and tool labs. And in case you think like, well, I mean even a big project had a big ledger like English with P is totally dominated by human users. The number was somewhere around 15%. I don't have the exact number on the top of my head, but the number was somewhere around 15% of the edits on English with P and were made by lots of them. So that led to this pitch inside the foundation and April 2017 we got the green light that we were a real team and our job now is, you know, now we're called cloud services instead of the labs team. We're actually called the wiki media cloud services team and we're in charge of the rebranded projects. So the thing that was called labs that meant OpenStack, labs that meant the team or labs that meant something else. They called labs that meant OpenStack. We're now calling cloud VPS and we picked that word, the VPS word because we hoped that it means something to people who've never heard it before. It made everything to everybody. It's not a universal term, right? But it is something that means things in the world of software hosting. I'm probably way too low on this. Virtual private server. And the rebranding of tool labs to tool forge. Basically once we started chasing the word labs we want to get rid of it everywhere because if we leave it anywhere, then we leave it and it never goes away. So tool labs turns into tool forge. And then we started actually giving names to some other things that hadn't had names before. Like wiki replicas have been called labs DB or the replicas or the slaves or the database or whatever. Tools DB was called tools DB when people thought of it as a thing in and of itself, but it often got logged in with the wiki replicas as just our database and not that it was special and distinct in that it provided different services. And we've come to the point where we're taking on some new responsibilities. So the dumps has historically been managed by Ariel and the ops team. And they do a really great job of focusing on producing the dumps, but never really were given many resources to worry about getting the dumps out to the world once they were produced. And so we're taking over that getting them out to the world side of it. We're starting to build the servers right now. Just a couple of weeks ago building new servers. New servers with bigger hard drives that are faster to hold the dumps and we're going to be taking over the HP interface and the NFS interfaces and the R-sync interfaces and all of that. It gets enough ramboos. We got architecture over here. Chase would love to talk to you about architecture over here. Sure. I'll cue you or anything. Sure. This is a logical review of things. The tool for the environment which is sort of the friendly environment that you can run something like that. You don't want to be responsible for a whole virtual machine. Say you want to run a web service and get a project and add virtual machines to have to update, maintain, you have to pay attention to this where we email you the idea that we're going to be rebooting them or we're updating some portion on that. And there is an overhead involved. So if you just want to do that, hopefully Toolforge is the place where you can do that. We have, I think, 1,200 or 1,300 of those tools. Our most of the web services are at least more fairly low-service top heavy which makes sense because a lot of it's data availability or visualization or research output or just web interfaces that some specific initiative needed some place to part. So within this project we have Open Grid Engine which is a distributed, scandaling environment. Essentially if you have some number of computers and you want a way to launch a job to run a web service or go crunch numbers you need to find a way to spread it out around a pool. And Grid Engine has a project to do that and it knows how many things are running where and what nodes are visiting when nodes are not visiting and it knows how to find the place for your thing that is most likely to be successful based on resources. Kubernetes can perform very similar functions but for more modern it has the ability to constrain the resources that a job can actually use so you want to run something that you say, Grid Engine looks and sees work that exists and shoots you out there but if it goes haywire it uses five gates and affects other people there's not a lot of room for exploits. Kubernetes has the ability to put a ceiling on resource usage but then also track how many resources are being used like how much compute time is being used by someone's web servers for a job and those are numbers that we've sort of longed for over the last few years. People really want to know understandably how much bandwidth it is and how many hits does a certain website get, how much CPU, how much memory. And Kubernetes uses some of that for free along with the fact that it's just a more modern container-based platform. Containers a big word but essentially it means that you can wrap up your tool in this little package and ship it out and you don't have to concern yourself with what everyone else is doing. So this all runs as a project but the project itself isn't special other than that it's really big. I don't know what percentage of VPS's tools but I think it's 125 virtual machines and we have 715 or 18% or something like that. But we have 212 other projects that people are running and some portion of that, I don't know, 20% or 30% are internal initiatives, right? People want to rewrite a new Kubernetes control application that we want to use in production but the majority are volunteer run or at least volunteer and staff collaborative projects. There's tons of things happening in the mass realm. I don't know all of them but they're busy. Query is the sort of dumps my SQL exploration tool where you can see what other people have run and kind of cobble together queries from examples you've found and it's a way so that what's in front of you is a black screen with a cursor. UV when he wrote it has a very accessibility mindset and one of his tag lines is that you don't have to pay the command line tax which is having all the context it takes to get up to the very last inch of what we actually wanted to do which was just to run one query to find out one little thing, right? So in theory this jumps you to 90% because everybody has a web browser. And then tools which is kind of the meta there. But other than sort of we started viewing these as products for ourselves and so we can communicate outward. So we call this the platforms of service which is a fairly industry standard term. This is infrastructure as a service which is fairly industry standard. So this is VPS but that's actually the old term that's been used since the 80s. And then data services is sort of the category of things that fuel these farmers so to speak. A lot of the reason that people come to tools or people start a project is to get easy access and to get free computer resources to look at the data that's available. So the looking records are essentially sanitized copies of metadata from all public wings and people need really crazy number crunching looking at editor retention and all kinds of things. The super popular stats page that's been around for a million years and we're putting tools of work into recreating that has all been done forever by volunteers using open resources. Dumps, everyone knows about. The site bar there is essentially we have dumps for the world and then we've been making copies of that to present to this environment and we're just collapsing all of it because it makes sense it's a combo. And then shared storage is sort of the underpinnings of the entire tool portion of our environment but then we also have to like share the scratch base if you need some place to park and carry away sort of temporary data and stuff like that. One question. What's the difference between replica dumps here and dumps here? Okay, so if you go to dumpstabicmedia.org there's a web engine you can download a bunch of stat files essentially and the archives there's all kinds of stuff like that. So those are files that you could download and unpack and kind of dig through. We offer those in our environment as well via NFS so like instead of so they're pretty large storage wide storage-wise so not everyone has to have their own copy of these static flat files, right? The wiki-rate because our real-time live that are constantly receiving updates from production as things are happening. So like one... I don't want to use a bad use case but I can't think of it better but basically let's just say you want to watch every edit as it happens you know for some patrol reason this is not something that you should do that but many moons ago that was really seem to have been very common because it's easy to understand that as changes are made to the database and they are replicated that you can consume them and then take action, right? Now we want you to use RCStream or whatever you are using. What is it? EventStream. EventStream, please use EventStream. What is it? So the difference is one is a built and placed completely static file that doesn't change I think we generate dumps twice monthly so like every few weeks we have a job that runs that takes everything and kind of combines it and drops the file and we do have people who unpack those and dig through for statistics or to build specific visual interfaces to historical data the use cases for labs do tend to be more dynamic like if you want to build query query has the ability to run ad hoc queries you know and you want to do that on some kind of structured query language back into format so it makes sense that just hits my seat with other words. Does that make sense? Yeah, thank you. I think the other one major difference that maybe was buried in what Chase said but maybe was buried is that the Wiki replicas do not contain the actual Wiki text of the pages so it's all metadata about all the edits and things that happen but there's no actual Wiki text there the dumps contain not only the current revision but all the historic revision Wiki text so like the dump XML file for Barack Obama is many many megabytes long and if you page through every revision of every history of Barack Obama it's all collected in one check on this file then you can go. Yeah, that was a lot of historical One neat thing about the dumps kind of revitalization is right now I believe we have three historical iterations of dumps available to our users whereas as we combine platforms we're hoping that people can access just everything that we have which will be set in the moment and there's also a lot of data on dumps that isn't accessible from this environment that will be nearly preferred We wanted to use Quori to analyze content because we wouldn't be using Barack Obama because that's just metadata Yeah, so Quori today cannot actually access page content so you can't do things like show me all the pages that have template X You can do things if there's metadata about it so you can do things like show me all the pages that use image Y because there's a special metadata tracking table that keeps track of what image is used on what content page and there's also special tracking tables for what pages link to each other so you can ask the questions like that but you can't ask it like find me all the pages that include the phrase two more tabs somewhere in the content Quori is just in front of Wikipedia because you don't have to have SSH access to log into a server and run the same query you just log in somewhere or you can detect for meta credentials and then you just run the query but the database is the same the reason for no content as I understand it is essentially resourcing that I don't know if it's that Yeah, it's essentially that like to re-host all of the content for all of the ways is this sort of attractable problem where we can offer you all the metadata and you can use it to make cat calls to production which there are a lot of tools that do that like figure out what you want based on kind of syncing through the pile of the rebel and then just being told I can get a couple of these when I need them Sure Alright So, multiple zones per client DPS and this is a big one this was a big part of our pitch to sort of leverage more resources and look for more resources Right now the media foundation has a number of data centers, right? and we run through testing where one of the data centers were to disappear we can move to another one All of this, the stuff that makes not quite 25% of all the edits and those are disproportionately valuable edits because a lot of that is copy patrolled and those are patrolled that drown the rest of the 75% of the edit 5% all of that is currently physically run out of one single data center and that's very problematic at least as far as robustness is concerned and the main problem is that the underpinnings platform that opens stacks up it's sort of operating in a legacy mode that the main projects have moved away from and we never had the extra time or the time to kind of put in the work to climb that mountain where we were trying to water it So when we talk about can we fail the wikis over if the primary data center disappears? Well you can, but you're going to use all of the tooling that people have spent for a decade working on to actually make the ecosystem viable So zone is just a designator for particular components of the environment where we can sort of have virtual machines that are run out of physical places so like run out of east coast run out of central US and that allows us to hopefully be robust beyond geographic events and not like giant stormings or hurricanes or whatever the thing is that you definitely want all you don't want all of your animals in that Modern and robust networking is a big part of that and essentially the problem that's the reason why we can't have things that exist outside of this one location is the networking model is old as we move away from it And I'm going to let Madhu talk about the dumps and the polygons I'm sure I also want to give a pause and ask if there were any questions before we learn about all of the stuff that we are about to do in the future I'm wondering when you say that we help people post tools tools that are going to add value to the big media ecosystem How do you decide that a tool is going to add value who decides that and whether that process will also be affected by the network Yeah that's a pretty good question It's an awesome question So the process of deciding right now today is that there is a tool within Toolforge called the Admin Tool and everyone who is a maintainer of the Admin Tool has the rights to grant new people into the overall product and there is a web-based workflow for that that is managed through toolsadmin.wikimedia.org So that's where you go to apply for tools membership and when you do there is basically an echo like notification that goes out to all the admins and as to how we decide basically if you can say anything that makes sense about what you're going to do we'll let you in it's a really low barrier so ideally it would be a transparent barrier like ideally it would be the wiki way and we'd let people in and we'd let them try to do something and if it turned out to be a bad thing we'd undo it just like we undo a bad thing It turns out in our software tooling it's a little harder to undo bad things so there's just this very low level like some other human has to click a button that says okay I thought about this for 5 seconds and sure they're in I think in the amount of time that I've been helping judge these applications maybe 7 of them haven't been denied and in that same amount of time probably 250, 300, 400 maybe even have been approved mostly the things that we deny end up being people who say I need to use your computers so that I can build some more profit project that has absolutely nothing to do with Wikimedia or you wanted to play closer software or you wanted to play closer software or a couple of times it's basically just been somebody who really didn't make anything in the fly and so we left a note and said please tell us more and then after a month that they don't respond then I just close it as decline and figure if they come back tomorrow they can open up a new one in SafeLine That's mostly tools but we also do I mean there's two levels you get an account and you can say I need my own project which is just an allocation of quota to go do things or you can say I just want to join tool the tool for a project because you know I'm simply the one specific thing and applying for a project basically requires who are you what in turn do you want and almost always the main reason we're asking why you want to do it is because someone else might already be doing it and we're going to point you their way or you may not quite understand what you're asking and what you've described as humanly possible or technologically possible or maybe you would be a bad actor if you were in the system like you're close or soft or some kind of poor property we're essentially a niche hosting environment with a lot of big problems one of the questions that came up when we were pitching the branding and using the cloud term and sort of consolidating on industry standard terms is how do you better out bad actors how do you make sure someone doesn't run a business off of this platform you know what I mean like how do you make sure someone is an active bad face and the answer that we gave Ben and Ben is well we pretty much have a lightweight contextual barrier to entry we understand what people are doing on the platform we don't you know we're not looking at our shoulder after that but for the most part the users of the platform themselves have way more knowledge inside the context into what's happening I mean most of the time when someone is doing something like using external resources that lead user IPs it's other tool for volunteers who are probably editors probably on wiki active and they're the ones who see the issue and call it out so I think in general sorry this is kind of long I think in general myself and Andrew there was a time when it was just he and I and we had a discussion and we really don't feel like we're the arbiters of what's valuable to the projects you know what I mean like in some ways you could be standing in that role but that's not our thing our thing is more like technical reality like are you describing a thing which you could actually do here and can you describe it to me in a way to which I understand sounds good to me so it's a very low barrier to entry sort of litmus, smell test slash you know actual feasibility we covered quite a few times but trying to change basically it's just that combine all of the services use cases because right now we sell like we have our speakers we have people who download it on the web and then people who access it through our environment and we just want to combine all of the external service use cases and post it on our site rather than currently it's handle by operation folks so all the data generation will be handle by them and then we're going to handle serving the data to all the different consumers of data that works post-performance initial improvements I am not really the person to talk about it you may who is no longer in but is still active you're working on post people and also for you and I just know how to log into your tool sometimes I will restart it but he is working on some of the online post-off and is saying that it will get a lot better so watch out for improvements there I don't know what else I think Brian is working on a lot of cover stuff documentation, workflows contractor community involvement do you want to cover some of that? sure a lot of the things that we're looking to approve from the foundation side is to provide better support for our communities of volunteer developers we've been doing an okay job for several years of making sure that there are computers and that the computers are up and that they're running but we kind of want to go beyond that and try to help help new people learn how to use the things better help people who have built things better promote the things that they've built to other developers and to people on wiki and to people off wiki we want to do a better job of simplifying things of following those things that UV was really kind of a groundbreaking champion for inside the foundation about simplifying things where they can be simplified and streamlining the processes that volunteers have to go through the thing there that I kind of I talk about internally quite a bit is that there's a very large difference between Greg sitting over here in the corner and Greg's release engineer team is a very very heavy user of cloudvps services the Jenkins build testing system runs on cloudvps the beta cluster that's our near real time continuous integration environment runs on cloudvps several other initiatives that they work on right now there's a really big difference between when Greg's team who's primarily paying software engineers and employ the wiki media foundation the time that they have to invest in learning how the tooling works and taking care of things and keeping them running versus the time of our average tool developer who maybe figures out how to take hours of a Saturday afternoon one weekend a month to work on making a cool thing that makes it easier to do something, some workflow on wikis it's kind of our job I think as providers of that platform to work at making it more and more often that when they sit down to spend those four hours that they get to spend three hours and fifty minutes of it doing something cool and only ten minutes of it doing something boring and repetitive and without some care and feeding of that it quickly becomes the other way that every time you sit down to touch the thing it takes you three hours to figure out what am I supposed to do and then you're only left with an hour to do the cool thing that you came to make everything better so that's a lot of that the better workflows or account creation for new people hiring a tech support contractor and even like consolidation of platforms is all towards that that kind of idea about just get as much of the mess out of the way as we can or take it on ourselves so that the volunteers who are spending those very precious hours can spend them doing good things that's a good idea alright I'll get it to the end of the point hi man we got that aha wiki replicas upgrade so we didn't really talk about this much yet but this is a big thing and our brilliant dba team has been working for how many months have you been working on this? yeah yeah so in these real time replicas of the wikis that we have one of the historical problems is that media wiki's database structure itself is a little unique and making copies of it making redacted copies of it so when we copy it into the cloud of data services environment we can't just take a complete binary copy of what's running in production and put it out where all our volunteers can access it because there's data inside there that's special protected data probably a lot of you that work on the wikis know something about check user and that check user means scary and there's something about IP addresses and privacy so we have to take that kind of data that's very privileged data that people aren't normally allowed to see and we have to remove it from the database copies that we allow people to get to and this process historically has made it difficult and error prone to keep up to date with changes sometimes when data is deleted or marked as deleted in production sometimes when things get added in production they don't get added properly in the replicas kind of because of all the kinds of mechanisms so long story long the dba team thought about this a lot and tried to think about ways to make it better and they came up with a whole bunch of changes in the tooling pipeline that they use and the actual way that the data servers are set up and deployed and maintained to make it more robust and easier to keep track of and then once they had that set up they had a many many month process of refilling those databases with all the copies of the production data run through the new pipeline to clean out the things that we aren't allowed to show you because the privacy invasive somehow and just last week we finished we got all the data pushed in and now we have them keeping up to date with new data changes as they come in so we have a new a new set of servers that we hope are well they're bigger faster hardware so that's a good start but we hope that they're also more reliable copies of what's in production particularly to be deltas between what you can see on wiki and what you can see in the replica and so now what we need to do is we need to get some people who are existing users in tool labs and labs of these databases of the existing wiki to switch over and start using these new wiki replicant databases so that we can kind of break them in and make sure that the performance characteristics that we hope are there are actually there and shake out any problems that we might have with changes that we did in the permissions model and things like that that you're not seeing things that you expect to see hopefully we won't find any where you're seeing things you don't expect to see but we also need to look for that and then so we're kind of putting out a call to existing users and we're going to start doing this with a little blog post in a couple of weeks and the mailing list push lets the whole wiki media craziness guys guys down give people to use them and then hopefully within a couple of months we'll be able to switch over and make these the real copies that everybody is using all the time and solve some kind of long standing problems that never do 22 yeah so the existing setup is interesting in that so you want to connect to a specific wiki you have some DNS magic to make that easy for at least a few people and you can find a specific wiki but then you run your query or whatever there are protection measures in place that take up disproportionate amount of resources but all of those users are kind of lumped in and you've got people doing they want to be long term analytics and you've got people running web services and so part of the idea of the switchover is to hopefully separate use cases to be more friendly to all so that a web service isn't tripping over long running analytics queries and less personal so these two surface profiles are part of the transition and we're hoping that people will be able to point to the right place and that on the back of the battle process we'll only see if it was for the users so a minor trivia and you know it's mostly come out of the hardware and scaling up horizontally and all of that but these are actually at least at the time the most expensive surface that we should have in the first place so partially because the existing setup just gets hampered you know like we need a lot of to a lot of users to not constantly above the average so we owe the DVAs pretty big than a grant the entire Spanish DVAT I've been working on this for six months every database administrator in Spain has been downgrading in this project so it's pretty cool the last slide this is one of the ways that you can contribute or participate I want to note that we try to be extremely beginner friendly on our IHC channel we're subject to the core of conduct on this channel and to just get started with any of our projects we have documentation in the Wikitec page there it links to a bunch of different things as to whether you want to get started yes or to forge and it's somewhat organized now even inside things that need to get better we have two mailing lists all of our announcements about maintenance things like that more discussions which will both get renamed to club announcements but they're not yet renamed so here we go we have a blog where it's mostly just technical updates and we also have some things that go on to that we also have our blog and it's a fabricated blog which is some reason does not have a pretty URL but you know it's view file and we track all our bugs in future requests and so if you want to file a bug or a request then you can just go on to fabricated I link to the thing but it's the cloud services then it's just the cloud services and then it shows up when we're going somewhere and Brian will try it