 Okay, I guess we can start with the stream is working that, yeah, I have somebody to care of it. So, this session today, I would like to talk about something that's going to be impacting us and probably in the next quarter partly. And it's about the dynamic configuration of the cluster services. My focus would be primarily on the WMAP cluster that I think most concepts would apply outside of our cluster. Yes, so let's start with one consideration that I usually do with people that ask me about our infrastructure. I usually say that it's a very well oiled machine when it's in a static state. So, it runs very well when we don't have to change things. When we have to change things, it's sometimes very painful. I mean, it takes a lot of minor effort to do that. So, let's start with a few questions. The first one would be, if you're in the legal board, how would you want to change the state of that base in the middle of the week? So, to take that base online, to make it master for the week, to configure its weight, change its weight in the configuration, change its role. So, if a server is going to be used for the API server, to survey API queries or not, and so on. Then another question would be, how should we switch what data center is active for a service? Which data center is going to be called from other services for a specific service? So, let's say, for example, REST Base. Let's take REST Base as an example. If I'm an application sitting in one data center A, which instance of REST Base should I call if I need to call REST Base? And I'll configure which one is going to be called. And how do I insert or remove a server from pool that's serving some service specifically? So, what's the correct way to do this? I'm starting with a negative answer, which is, in my opinion, the answer is not with a commit to a configuration repository, which is exactly what we're doing now for most things. So, right now, the standard way to configure things at the foundation is that, whenever you have to change such a dynamic state of things, you still have to go to a configuration repository or to a public repository, change something, review it, commit it, deploy it in the cluster, run some agent or whatever, or run the deploy and then the change goes on. My idea is that this is basically inconvenient and somehow wrong for specific things and which specific things comes down to the definition of what's configuration and what's actually state of cluster. So, configuration could be something like which wikis we're serving, what kind of features we activate, which extensions we have in media wiki, how long is the cache going to be staying in some caching system, how do we, if we do gzip content when we save it or not, these kind of things, right? Things that configure the general behavior of software. But there are other things that are not properly configured. They are the state of the cluster in this moment. So, is this server active or not? Is this service active in this data center or not? These are things that are not properly configured. These are descriptions of the state of the running cluster. And, well, these things don't really belong to a configuration repository, in my opinion, because it takes a long time. So, here is a list of things. So, for example, the number of HHVM threads that are running on a server or the maximum execution time of PHP or is it server-online or not. And there is a list here. So, in my opinion, some of these are states, like, for example, it's a server-online, it's state, it's not a configuration. We're not configuring a server to be online or offline. We're just describing its state in this moment. Well, the number of HHVM threads that we are going to run on a specific server, that depends on just how the server is built. So, the number of CPUs, typically, or the amount of RAM you have, and that's a configuration. That's not something that changes because of a sudden change in state of the cluster. Also, the read-only state of a database is debatable, but it's mostly we can consider it to be a state because it's not necessarily long-term configuration, something that can change suddenly because we need to do maintenance. Right? Or the weight of server in load balancer or from the perspective of a client, that's against state. That's something that we want to change dynamically from time to time based on, I don't know, load of the cluster or maintenance going on on the server or so on. So, there are things that are properly part of configuration and things that are not. And we don't distinguish those at the moment, mostly at the database. The result of that is that changing state or coordinating complex changes around the cluster is very complicated and burdened. It's a burden. We need to do a lot of manual steps. A perfect example of this could be the way we do a switch of the active dot center. Last time we did it, it took, I think, 10 people working in coordination and making comets while running scripts that were acting on the servers and we gather things. I don't think that's the way it should work. And the rest of this talk is about things that we can do to ease up these things. And some ideas on how to solve it. So, we started to move away from everything that is in configuration and to have a system that stores, basically, based on some base configuration that stores the state in a key value store, distributed key value store. And this key value store is then pulled from the applications or some external agents in some ways to gather this state to the application itself. And the tool we built, which is a very simple tool, actually. It's named com tool. You can learn more about it at this URL or directly from its repository. So, what basically doing is taking this tool is a way to write things to a backend. We are actually using ETC, which is a strongly consistent distributed key value store. So, it's basically a place where you can put small chunks of data that you want to be able to have a consistent view across the cluster. And being distributed, the advantage of being distributed is, of course, one machine in the cluster fails or a certain number of machines in the cluster fail. At the same time, the cluster should still keep going and you can still get the state of the cluster from it. As I said, it's designed to store a small amount of data and I think it's a good thing because this prevents us from abusing it and putting entire configuration files or things inside of it because that would kill the performance. It has a watch API that means that clients can watch ETC for changes so get a stream of changes that are coming from ETC so that basically if you store some information, some state about things in ETC you can have client applications watch it and whenever something changed, they can react to the change directly. It has some issues that we can live with in our current situation so one is that the model for authorization is kind of limited so it's not very easy to create complex ACLs but we don't need them at the moment and there are some performance issues, meaning that in the current version we're using right performance is not great. So again, it's not a tool for things that change a lot and I think there's a question. Because this is a different thing completely. PuppetDB is used to store things that have been run by Puppet. Puppet is a configuration tool so all the information you have in PuppetDB comes out of the fact and the only way to use those is to query PuppetDB directly it's not distributed, it doesn't have a watch API it's not easy to interact with your outside Puppet as anybody that's tried to write any client library trying to query PuppetDB knows. It's also not very good in performance the performance is much worse than it actually is kind of unpredictable and it's meant for Puppet so it's something that works for the configuration of servers the static configuration of servers is well-served by using PuppetDB as a storage for information it's not really a place where we want to store state because it has some inherent limitations but I want to be clear about most of the tools that we're using so both Comtool and other tools that we're using we're going to talk about later are kind of independent of the backend so they can have multiple backends we choose to use ETCD but we could easily switch to something else in the case we decided it's not suited for us anymore so I told you that there is a way for applications if you write an application with Scratch you can make your application watch ETCD and react to changes in it but in some cases you're not writing the application so it's inconvenient to make such a thing and so there is a tool that has been created some years ago which is called Comfti which is basically doing a bad job for you this tool can watch ETCD or other key value stores for changes, whenever a change happens it can react, it can use the variables it gets from ETCD and hopefully it templates some configuration files and then validate these configuration files and you can run a script after that and maybe reload the configuration of your application one way we use it is for varnish at the moment so the list of varnish backends is managed by via Comfti because we didn't implement inside of varnish we did not have to varnish the ability to watch ETCD directly it's not convenient, honestly not convenient so Comfti kind of resolves the problem of integrating this kind of state management with applications that are not designed in here so this is when you want to change just the variable but sometimes it could be enough just to change what the DNS returns for an host for example let's say we made the example of a service that we want to take offline let's take again the example of REST Base let's say that in one of our that center REST Base we want to do some maintenance of sorts or we want to take it offline we don't want to direct traffic to it one way to do that would be to have a standard DNS name that the application called for and to change the DNS name and that would direct the application within a certain time frame to go to the other center directly so DNS would be another way instead of Comfti to communicate these changes to applications in case it's something simple and then again as the same point I made yesterday is that we always have to keep in mind that it's not just the WMF using things so we can just call this kind of thing directly inside of MediaWiki and make it mandatory for MediaWiki to be able to speak with key value store sorts because we want small scale stores even not just few PHP stores but just small scale stores that don't need this kind of dynamicity to keep working without having a ton of other things around that are not really needed in this case it's not even a matter of ease of installation in this case it's just that it's not needed in a lot of cases I mean most people don't have multiple data centers for example or they don't have complex database infrastructures that they have managed so it's better it's in general better if integration of these kind of tools leaves outside of the main part of the application and I don't even want everybody to need to code the glue work to work with this kind of system in their applications directly to make that in their application directly so there are two types of ways basically we can distinguish the classes of problems two big classes of problems the first one is service discovery so the problem of basically adding an address book for your service oriented architecture or even for just your cluster in general so adding information about what's the URL that is read write for a service where should I go if I have to write something to that system for example media wiki api that's taking media wiki api as an example where should I go what's the address that I should use what's the URL I should use if I have to write something to the media wiki api but we would be in a situation where we are active active on a service so it it can be written only in one of the center for example in one cluster but we have other clusters that can serve as read only is names that can be used to serve queries so that we can ease the load on the main cluster like we do with that basis basically and so that's another thing that we might want to know what's the local URL the URL that I should use that's read only and that's loaded to me that's nearest what are the servers the list of servers for the service in some cases you don't have just one URL but you have a list of servers that you want to know at least one of the main points I think about the discovery service is that sometimes people tend to just present all the data that they have in the discovery system to the applications and then make the application do all the logical work of understanding where to go to write or where to go to read I don't think it's a good idea because that would be again the case in which we make we want this logic to be simple and surprise application just need to know I want to read, I want to write what's the URL where I should read from what's the URL I should write that's simple so given the kind of questions that I just posed I think that DNS seems a natural candidate to do this kind of simple integrations because DNS is simple and everybody knows how to read DNS whether with some mutations because as we can see that most languages have very bad implementation of DNS queries PHP most notably so to respond to all the questions that I asked you before these questions those could be basically syn name records for this discovery or main VA records if it's a list of servers that you take a look at and then if you want the few URL and not just the hostname you could have txt or uri it's your kind of label to get in the few URL of the service of course if this is storing state so it's something that's going to change dynamically we need to keep the detail of the DNS records as short as possible this will mean that the applications we will need to ensure that all the applications that want to use this system know how to respect to honor a TTL we probably want to be sure that they can cache properly the DNS queries or it's going to introduce a lot of latency and in general we should be sure that the DNS libraries for that specific language for that application are able to handle these things correctly as an example another example PHP typically has no concept of caching of records so in standard PHP application if you just take a look at midi-weekly config in midi-weekly config we have a lot of things that are still presented as IP addresses that's because historically PHP has no way to cache DNS records and that's not true completely for HHVM HHVM has a very long way to do DNS caching which is you just set the value for the cache duration of any DNS record in the HHVM configuration so at the moment I think we have 5 minutes so if you resolve an hostname with HHVM it will be cached for 5 minutes it will not be requested for my minutes more which is a good compromise but it won't work with this if you want to change the state fast it's not 5 minutes but on the other hand we can't get that number much shorter because that would mean that we do a lot more DNS requests for our services and that would introduce latency so we always have to think about if this type of solution is feasible depending on the wouldn't the normal kind of PHP style solution for that be better caching on the host that HHVM is running on itself some more tunable resolver solution that could be a solution but it's introducing another layer of caching the reason why I don't decide the solution of using DNS is that you're not introducing a lot of caching layers of the data that comes from the store you are just one day aware of this cache everything else you've done correctly should honor the TTL that that system is giving them and you know you've guaranteed that you have that time resolution to get the change propagated across the cluster when I say very short TTL I mean I don't know 30 seconds or so not anything shorter than that properly we have the ability to think that dynamic as well in the case of manual state changes in the case of monitored state change you know you have a rule set up where it may be pulling the service every 5 seconds and it takes a certain number of failures to throw it forward tomorrow down and so we have the capability to make the URL scale down as the failures start coming in in this case it's not really a fixed value and it's very important that the TTLs are actually are correctly instead of being fixed probably if you're doing something that's going to be it's a program change so the plan change you can probably take down the TTL before you do the switch properly but yeah so the idea basically here is to wire the connection between the DNS and this DNS expose those data to the applications in a way in a protocol that the application already understand so a few examples so the examples I took are these let's think of some application that's live in Kotlin for example so one of our that's instance and it needs to know and this isn't the hypothesis that we are progressing our plans to have active-active data centers for the weekly so this application when it wants to go to a redone version of the ABI it just goes to ECHIAL in this case because this is the local data center so the application is in ECHIAL but it wants just to read things we want to send it to the local installation because the latency is a lot smaller while in this example I said that a read-wide URL will be in .fw instead so an application in ECHIAL will read from ECHIAL and will know that the URL as code is in .fw when it wants to write something so that's a few examples of how we work with a DC record so this is another problem I'm not sure that every application will be able to handle TXT records correctly every client library will be able to do that so probably you will need to have a CNA so this is for simple this is just a simple service discovery interface right it's for very simple things like where is this service located where I should contact this service but there are things that are more complex than just this simple address book like application this is what I call state management so it's not just simple discovery but it's a state management that's more complex and there are things that don't fit the DNS model for example if you have complex data if you are using complex data so it's not just a simple URL that you have to retrieve a complex structure let's take as an example again the database information for MIDI config it has a lot of information in it you don't just have the sharp and the list of slaves that you are going to contact but every slave has a weight can have different roles so it can be used for slow parries it can be for APIs it could be one of the specific services that you want to use for recent changes so there are a lot of things that you need to know besides just the URL and in this case it's probably DNS does not fit you could shoe or disinformation inside the DNS it seems like you would naturally shoe up if you have those different services why aren't they different post names is it still query, post name, trample, rc post name and so on but then you also have weights applications wouldn't need to would reach to call different things when you could have a subreddit record that does bad but yeah it's sorry it's not not everything is going to flow a lot but I'll have examples not just let's take the structure of a job queue for example the job queue raises this is a complex structure of replication trees that you have to represent to MIDI Wiki I could think of ways to MIDI Wiki to use DNS but again MIDI Wiki is the prime example of a thing that has a problem with its DNS libraries so it wouldn't be able to use DNS in ways it would be overly complex and it would be an issue anyways because of the way PHP works with neighbor solution so there are or in some cases you want to have some second coordination between different things databases again would be a prime example of that in some cases you want to have a shorter time time frame between when you start to change and when the change has been received by all your applications and databases seem to be a task of case then there is the problem of latency in some cases you have to call for this information if you are doing DNS you could say ok I can take down the TTR but then you introduce latency in the application because you have to re-solve this name over and over again continuously and then there is a problem of course for party applications like varnish for example it's a classical example of a case in which you have state management that doesn't really fit the DNS model maybe so there are a few reasons for not wanting to use DNS for everything I mean we should aim to use a simple interface for most things we can but in some cases it's just not convenient I agree there will be a few but I mean in that case there are really special cases of load balancing right like a varnish case where we list all of the other varnishes it's contacting directly it's supposed to use the load balancer but really varnish is just load balancing for itself of that we essentially think of it as a load balancer configuration and some of these other things I mean when we talk about how so many different languages don't have the DNS support therefore that's a reason to go on I mean I think that's a good reason to go back and catch or fix those languages but it solves so many problems we don't have an open DNS invitation to use it it's worth scaling after that making them use correct TTOs what they were talking about it's a big undertaking but at least in the case of me he probably has something that just preloads the configuration locally that we do for varnish sorry so let's get on with complex structures yeah this is just an mock example at least Jaime don't shoot me no that's a secret vibe that is a secret vibe yeah it's a very secret vibe I said it's a mock so don't kill me but there are things that are just more when I was telling you obviously we can think of ways of doing it but I don't think it's worth the hard doing that and in the case of databases they can have a series of characteristics I mean probably these data are going to be used by okay in our case just by by the wiki but it could be used by different services in different ways so I really think that when you have more complex data I mean DNS is not exactly but let's wait and see but in this case so we each dive into this example so here's this set of data for this one instance of service but where is that structured like what's the keys that lead to getting that data because it seems like shard s2 is not a very volatile property of that server name no no no no no yeah just a moment it was just to exemplify in case yes probably s2 would be one of the tags that we attach to the object because it's an interesting property it's configuration it's not state you're right that's a classical example of something but it was a dumb thing to put there I should have put it in ways so the other point is of course more coordination with changes from experience I can say that usually takes less than 1 second to propagate changes across the cluster we've seen a prime example of that on Friday it crashed Spanish but in less than a second across the cluster but yeah in general it can be a good thing I said that probably 5 seconds as a time resolution is a good estimate of the maximum time it will take you when using confety to propagate a change across the cluster I don't think we can do better with that with SCAP probably an important point on this too is also that it's asynchronous and major and you have to be aware of that especially before we write read on it you'll never be able to take read white and flip it in raw operation for A and B it's not the time it needs it's as fast as it can get you're chasing for a state you're trying to control exactly where rights go so you're going to have to have a mental model where rights can be shut off and then turn on not flip it for me you said everything to read only so that rights pay you and then you change the state but when we look at this model like we were showing earlier with post-name we have to rewrite and read only post-name and DNS and stuff like that the consuming client application and the through that mechanism that at this precise moment in time there is no rewrite destination probably yes we have to be able to turn it off see that sync everywhere and then turn on write somewhere else so how are we going to do that could we send them to like 0.27001 or something about IP address in the US temporary but something we have to think about even taking a look at our application react yes that's a valid point there being a write destination because that actually happens when we switch that center for a small amount of time but still not insignificant you don't have a location where you can write so that's something that's going to happen that's a valid point so we have to think of a way of telling the applications now you don't get that at this moment there's no URL available to write to finally yes PHP is kind of a point here in some cases you have to you have applications like especially PHP again that don't have a shared memory that all the threads can access or they don't have a long term memory that you can access because the model of PHP is that the scope, the memory scope of PHP is the request so you create a new scope whenever you start a request in any way when the request is done which has a lot of advantages in terms of speed of creating a HTTP based applications but it has drawbacks like the fact that you will probably, if you have to request an OS name you will have to request it every time you do a request from scratch because that's not something that you can memorize in any way but it's just not the mental model of how PHP is built so I think DNS is probably your lowest latency so it's simple you to keep packets as design but it's more latency than having something in a complete file that you can read on disk right, I mean the problem here with latency is just the caching there's no caching but otherwise there's no reason that DNS can't be the fastest solution the fastest solution of course is to if you change something on disk and you read it from disk it's faster than latency but you introduce no latency in your system by having to do an additional DNS request in some cases again in mid-week we mostly have IP addresses at the moment because of this, because we want to avoid additional DNS requests just for various reasons so okay, how do we manage a complete state a solution is using conflict which is what we're using at the moment as I said it has multiple backends it has watched a complete issue it can create files configuration files, so the idea is that conflict is watching, it is different for you it will create a configuration file on disk based on an ugly model which is going to go text templates we all have to be aware of that it's ugly, so it has to be simple, the thing that you're going to do with the data you have it needs to be simple, you don't want to do a lot of data manipulation with this thing and I think it's also a good thing because anyway else you will have another templating language inserted inside, probably another templating language you already have to create your config files and you end up with an on-only mess, which is basically what we had with varnish when we created it right, it was a convolutional PCL puppet templates it can go text templates it was kind of complex it was not pretty yet to say one of the good things that conflict can do is that it can run validation script on the generated file so before it substitutes the file that's on disk after it got a change from dtcd it can run a validation script so if a new template that you create is not body for example, let's say you are creating a php file if a php file you created doesn't respect the syntax or doesn't have the variables you want to have with those data structures that you want to have and you could do this in a validation script it will not change what you have on disk it will not apply the change this is important because in case somebody does a human error in such a system it's very very well tied together between various parts could be disrupted if you don't have a face-safe and this is one face-safe you can validate whatever you get from dtcd and the file that you create to be valid and then it can run a script after that which is typically what you use to reload the application so when you have an application that you want to manage with this way what you do is you change the value on the key.ly store from the read-it to check if it's valid if it's valid to change the substitute file on disk and then it will somehow tell the application a the file on disk has changed if needed it will do that and make the application reload its configuration there is an inherent if you want philosophical and even practical problem with this approach which is we do have a consistent store for our configuration but then we are relying on something else on every machine that is going to watch for this state to change to work correctly so we need to have a good monitoring of how the works, if it crashes, if it's not able to reload the applications we kind of have some of these things already in place but it's still a distributed cache in some way of the values that you're creating that's why for simple things I would prefer to use DNS wherever possible because we just have one place where we're doing this caching in the amount of number of places it's not a thousand machines so that the probability that something breaks so that we don't see something breaking or lagging behind for any reason is much smaller and because anyway I would have just said ok let's go with CompTIE and it's ok for the long run so an example of what it does let's say for MediaWiki, an example of how it could work and let's take an example of databases so our application server needs to know which databases, which database hosts are available and where they are there is a very big file on MediaWiki config that you can look at I think most of you are familiar with that so the idea would be to do something like this you watch with CompTIE a new rel of this directory that has all the data for all databases whenever something changes there you generate something for example a json file that has all this information coded into itself you do sanity checks on the json and then you somehow upload the result of the json file to HHVM in its APC cache this is one of the things that was proposed on the ticket I do see a problem with this approach and I wanted to show it because it's something that I have some problems with which is you're introducing another level of distributed cache which is the APC cache in HHVM so whenever you want to do something this complex you always have to introduce monitoring as well to be sure that what you have on actually have live on the systems that you're running is in sync with what you have in the state management repository and finally to summarize what I said today and then some examples of what we do with that I do strongly think that we should create state and configurations separately we don't do that mostly as a foundation and that is making doing dynamical changes to the way our dust operates slow and cumbersome for operations it also means that for example for me the wiki we do what sheet tone of deploy but always have a small amount of risking attached every time you do a code deploy you could do something wrong in some ways that's not really anticipated we want to start to do that we do have a tool to do to manage state which is called top tool which we created that we probably would want to extend a little bit for simple simple service location and discovery probably the best interface we can create is a DNS interface so the idea is that this state repository that's in ETSD will be connected in some way maybe with the accounting to DNS servers but it will be the interface that application will use for simple service discovery so for example let's make a simple example for this we decided that we are moving what something we did just on Friday we want to move the swift cluster that's active for funds from being one in ECHIAD to the one that's in the audit center this should be done by just changing a team company and then varnish itself would just ask the DNS who was the URL for the service and get the right one directly and we don't need to change it in the varnish configuration go and run puppet and open chains so it would take this change at the moment it takes I don't 10 minutes to be run across the cluster because you have to write the change in it upload the change review it merge it run puppet everywhere and then it will be applied this idea it's the TTL of the DNS basically you just type command and change the state and it will be applied to the TTL of the system for things more complex but we should be very careful not to abuse it because it has some drawbacks compared to the simple DNS interface we can use more complex things like using complex templates plus scripts that we will validate the result with all the applications for media wiki specifically there are two approaches that I thought about that were discussed on the ticket as well which are one was we just write a JSON file that can be read by multiple applications not just media wiki but I would say why because it's just media wiki using these things and then save it's value to APC so it's in memory for HHBM I'm more inclined to think but just writing out the PHP file a simple very simple PHP file but using the source code for HHBM which is basically redoing what we're doing now but in a more deeper way instead of doing a good deploy we will change the PHP file that it will be changed by an agent directly and not just cut with a commit review and everything and of course an important fact for media wiki specifically and HHBM is that we should have proper checks to ensure that the statement we are propagating has been propagated correctly and that we are not having inconsistency as a cluster cluster so this is what I had to present to you as a proposal for what we can do from now on I would like to hear feedback from everybody including the developers of applications that might have other needs sorry? so one thing from that last slide that popped into my head do we still have HHBM instability based on the TC cache or whatever the hell it is when we change files? yes there are solutions for that probably one of the ways I put about is that you write the PHP file and you just make the local repo which is the read only thing as far as HHBM is concerned read it and you reload the local repo and HHBM that doesn't go to the TC cache but I mean that's the problem we have with every problem we do right so it's not a new problem as long as we are changing everything if we can kill another damn problem that would be good it's not a problem you will solve completely because that usually happens more when you do a big code deploy but we also have a problem with HHBM sometimes it doesn't read the change in the PHP file so yes in this case I would like to make something that's more surefire what we generally do with deploy which is fire away and hope it works so yes incrementally want to get something better from the player perspective the form of the problem any event would be out of the three events cup deploy right so scouting would have to want to ignore it and then not so my preference would be for the floor just for the planets yeah I understand my point of using PHP maybe instead of JSON I mean JSON has if you write the data structure today the advantages you have you can use the same model template to write JSON for different applications not just maybe a wiki but if you have that structure but it's not just used by maybe a wiki but other things as well you can write the same JSON file to this screen everywhere and every language every application in the world would be able to read that structure from the JSON file the point that I was making when you started to APC you were introducing different levels of cache of this state data that are distributed across the cluster and you have I mean it's ok as long as we write proper checks so we have to know that it takes some engineering work around it to make it work correctly and not having unintended consequences can I answer principles to think about the difference between state and integration when you choose one so it seems too many things to mention one is big appropriation say how fast we want the change to be faster and the other one is how frequently we change well personally and this might take if you ask around the room everybody has different lines growing the same but personally I would say on how things are running at the moment so if a server goes down for example because it crashes or anything else if it's up or down that's state it could be argued for example in the case of databases if the rows of the databases are configuration or not like if a server is suited for serving slow queries for example but again it could just be that they are overloaded and you want to have more server slow queries at the moment because it's convenient for you so you're changing the state of your photograph of what it's going to do then other things that you can even think of changing quite constantly and not a good example now but things that are more properly about how software works in itself what it needs to use but how it works in itself internally it will configure itself those things are configuration in my opinion and those should always go through my story is personal which is if you have an elementary and you either push a button and that pushing a button is pulling a server pulling things around and doing it fast you normally don't do it as a server working with that state not because it's an emergency because in an emergency you could also change configuration that's how I one of the reasons why I see and the other reason is for example in my particular case which is databases if you see I want to confirm I have probably one of the one of the people with more comments there which are basically on, off, on, off yeah that's clearly stated I mean everybody can intuitively know well and it's just for example the thing that you said like that's actually configuration it might be convenient to credit the state no, the thing is the reason why I treat it as a state is because normally that is never touched the only reason is because oh, the pulling, slow to this slow back, slow to this so you are, you are not doing the same thing on the configuration state is just is it the not only that it's something that you are just saying for doing maintenance and then putting back because in the last case it's not a configuration that you would really need to change things in the long term it's just I get it now, I'm going to revert to that that's how I make sure that different products go back the other thing to point out is that I work in apps in roadside configuration state so I don't know if that would have to do with the line but also that would raise the question should we have a new configuration for state changes I guess not that's the point it's something we should lock any state changes that we do and actually in principle whenever you act manually and uncomfortable you will lock automatically to the cell but we are already using it automatically, okay we are already using this system to describe the machines in a pool for our load balancers so what we already do automatically in this case is when we restart the service we pull it from Pible we ensure that it's the pooled we start the service and we pull it back and when we do that that's done by a script and that's not locked because it's the automated thing that's already coded so we do the code review on the code that will actually be the state change but in every other case we are logging it in theory I think that's kind of another mental model to follow there is sort of like with the individual hosts there's a big distinction between what we push as man and what we do and what we execute on the command line like we log into this machine and we stop the service or we shut it down and we do other things that we do manually on the command line you can think of the comfortable state management as sort of a better command line for the data center of the cluster it should be those same sorts of operations we only have one comment which is I'm getting scared of this code of the because it's not simple as in first we have a problem that's clear and probably I mean then we have two solutions obviously they are needed because they need two different ways of doing two the DNS pointing thing and the ATCT backend then because we don't trust ATCT we shouldn't we have to program scripts to change the state based on ATCT we should program for each individual service monitoring service to check that the thing has ignite and I'm starting to get scared of the models of work basically there is not a simple solution but a simple solution is doing code deploy it's like we do now and it's still not guaranteed I mean whenever you do a deploy you have to trust the fact that SCAP would work on every machine the list of machines that we are using for SCAP is updated and so on but that's something that we don't do exactly that's my point we don't do it at the moment but we should do this gets to that point which I've been hammering off when we talk about all these custom solutions to state configuration I mean it's nice to be flexible to be able to handle any future use based on all of that I agree and there's gonna be some custom ones like the custom one online right now it's not going to fit in any standard model it's just like we're going to need this data that says the cache is in this DC it's just a text there's nothing to do with host names it's also this sort of statement so they're going to need these custom cases but for the majority of cases it is a simple thing it's just a host name if it's rewrite or read only or slow query it can be part of the host name it's just a service limit and the DNS protocol it's one of the most fundamental ancient and important parts of the tcpr protocol suite it's fundamental to everything and it is my fault like the internet service discovery system like we already have a protocol for search discovery and it's that and failure to implement it correctly on the client is a problem even if we don't do many of this it's a problem for all sorts of other use cases so I don't want to see us saying we're not going to use the existing DNS service discovery mechanism on the internet for this particular purpose just because the language we're using sucks at being one of the most fundamental protocols on the internet we should go fix the language and use the service discovery mechanism we've got you know what I mean? let's try to see push that direction then give up on DNS because client libraries suck at it in fact I'm kind of advocating for using DNS whenever we can yeah and yeah going to fix the way the way PHP does DNS look up it's a total task let's say it's not worse than Java oh yeah okay Java is a bit the class of its own but yeah actually HGVM kind of evolved some of the model that Java has and it makes it slightly better but you know it's real simple when you make a DNS query the data comes back with the TTL like the data is right there when you implement the client library you just stay in the middle of the cache with that TTL it's going to disappear with that TTL yeah but luckily we're very confident there's a lot of information there by the protocol so we should fix that right now yeah PyBul does it correctly Puppet doesn't and we found out that they are the way when we were doing Puppet multi-dc work it's either that or I mean if we assume that all client libraries in general are broken we don't let them cache and we deploy DNS caches locally on every machine yeah that's another possibility I don't remember who would say that but yes that's another possibility going to the example of MediaWiki that could be a solution for in MediaWiki we have two different kind of configurations that are property state which one is what's in production services if for whoever is familiar with MediaWiki config it's a file where I basically collected every configuration that's just an OS name or a URL that's in MediaWiki config and made it change based on the data center those files is basically what should be in the discovery service via DNS I mean we shouldn't need to have production services to change between we just have a fixed URL to call and then depending on that you get different DNS no it seemed like I don't know an anti-aircraft large that's a staging by sticking to that DNS thing it gives us a place to handle some poor complex functionality about active active and all those things like we can do that in the DNS or the cache TTO proper data is being pulled from instead of having to re-implement that sort of logic the problem is MediaWiki is that we have a part that's properly adaptable to the discovery system and part of the things that are key value data that have not much to do with OS names that should be in state configuration so it's it's both things in the case of MediaWiki we might decide that at some point we want to have a local DNS cursor on the machine on every app server that is going to heal the problem of resolution latencies for the part that is the discovery system because those things are going to be exposed not just to MediaWiki, for example the OS URL that's active at the moment it's something that's going to be used by MediaWiki, by REST based by, I don't know, whatever else or the Cytode URL it's something that's going to be used by several things but ultimately, DNS resolution even without a cache is self-reliable it only matters when you're doing it like literally 100,000 times per second which is what happens with the apps but even then, I mean again if you put a local cache on the machine if it can't do its own cache but there's no reason those URLs they don't even need to be 30 seconds so the point is that it's totally trapped within the DNS system it's not really a blocker you've got about 10 minutes left ok, so let's talk a bit practically in terms of WMF one of the big goals of OPS this border is to prepare for the next which over of that center for MediaWiki and other applications and part of it is to have a simplified process last time if you want to there is a page on Wikitec which is called that centers which over which has like 20 different steps that we should have to take somewhere in parallel, some not when you have this coming to you where we prepared in advance and to take down the time but we took us but still took us like 20-25 minutes even when we were trained by the first time it took us 20 minutes the first time it took us 40 minutes because we were already trained by the first time so we were very fast the second time well actually taking I think it's kind of embarrassing that it takes more than 20 minutes to switch over that center at the moment and this is one of the things we want to take and probably this is the kind of things that we will speed up the process significantly so this is something that we will focus on directly in the exporter and we will have to work with owners of every different application that we are going to integrate in the system to check that everything is working correctly in most cases it will be complete transparent because it's just going to be DNS I mean for most services that we have so yoids let's call them those are just going to request DNS host names to test a little bit that Node.js and Python which I think both work correctly with TTLs do that in practice but apart from that it's going to be mostly transparent to the applications where whereas for MediaWiki it's going to be more complex because it has much more complex structure that we have to manage whenever we want to switch that center it's not just well the way we structured MediaWiki config it's mostly changing the URLs and then a variable which is the main center that we're using so it's going to be an iteration of things and we probably will integrate this more in long term but in the short term it's just going to be integrated in a way that allows us to do most of our day-to-day chores let's say in a way that doesn't include a code deploy or a puppet run just to change a URL or a label and I think we will plan to do this this work Marco? Probably, yes Yeah, exactly that's the point we don't want to have to run puppet or to have to do a code deploy just to change state of something that's exactly the definition of a problem and it's also a solution for the problem of a very complex process that we have whenever we want to switch that center at the moment DNS is with the patch in Garrett that gets yes so yes, one of our things we have to do No So, I mean, yeah the office servers we use to have a couple of different ways we can solve that problem one, we have a file called the affinite state file which is currently committed through Garrett but it doesn't have to be and it's a way of dynamically in the case where the DNS server has the information of multiple instances and multiple data centers for a service that state can be set to dynamically knock one down and that's also what we use for inter-BC failure and all of that currently that's through Garrett commit but there's no reason anything else can't control it the text file that says this is down this is up and then the other tool we have but I think we should really go to looking at this sort of thing is we can write plugins for that DNS server and there's no reason we can't put a plugin in there that's watching ETC we can figure out which keys to watch so we can bypass the idea of using confty in the case of DNS I think we have expertise in ours to modify the code of DNS server right sorry there are a lot of DNS server the back links on DNS yeah but on DNS and KDNS are very limited I can look at those very hard now there are a few problems first of all they mostly just supports names and A-records they don't support any names second they do have they do want a very specific data structure of records on ETC to use which I think is mostly limited and the code interact well with the rest of the things we want to do I mean we could still do that if we want but you would probably end up in a situation where you have Kubernetes when it's using core or DNS I mean now there is a middleware for core DNS that speaks Kubernetes directly but at first you need to add we ETC the data in Kubernetes ETC the as well to store it's date so you add ETCD storing it's date in ETC in Kubernetes storing it's date in ETC in one format you had some application that was looking at the ETCD for Kubernetes and rewriting the entire format the dumb format let me say for DNS for DNS to be working so we would end up in bad situation mostly that maps well with the current G as we set up though is that it has separate ideas of plugins for resolution and for monitoring that already work with each other so we can do things like use the existing GYT plugin to know about OTC situation and plug in ETC just as a monitoring plugin which shows how to mark things down for GYT's decision tree and that sort of thing basically I don't want to see things around we could engineer things so that what we write from home tool can be read directly from for DNS or DNS but also I'm not sure how scalable they are reliable they are there are a few reasons why I prefer to stick to DNS the possible we can do that incrementally also we don't really need to have a plugin to write read directly from there so two minutes one last question if somebody has a question or comment maybe think what sort of chance we will do on the data what the most obvious thing I can think of is the if we're writing an empty configuration or say if all of a sudden 8% of the servers disappear what about yeah of course you have to do those kind of things you have to do some checks on what you write of course you can even do that directly you can even do that on the comful side of things in some cases but argue that's not always a good idea or on the client side of things so if you're getting some way that we decide and they can in the application you could decide I'm not going to apply this new version of the configuration which is what we're doing with already good varnish whenever the varnish gets it's not going to be valid as configuration because it has no vacancies there's no vacancies configuration that will generate an error so we're going to go back to the config and that's what I was going to say but the model I'd like to see is good because we just haven't had the time to do really is to write a varnish new model that looks at ATC basically we can have config and it says monitor this key path and put its contents in this vcl period and that which dynamically updates and the logic takes care of that right it's a workaround yeah it's mostly a workaround I kind of for general purpose applications for general use I kind of like it because it completely does things correctly already it watches correctly it reacts correctly to I mean it can be tricky to get watching part correctly sometimes I mean we had to re-do for example that's directly it took us a couple of iterations and general science checking there's a few things that come out of box but in specific fields probably you want to do a direct integration the other advantage of using the overdoing things directly is that tomorrow we decide this is not working we want to use the new shine pool that's better to the job of the years multiple mechanics already new ones are getting added constantly and probably it's a big cost it's a big cost that will pay for that abstraction the necessity of not only writing config files but other things have to be on your own disk and read but also the complexity cost will change it means stuff with those go templates template things and all of that the alternative of course is to go into all the things where it's important anyways and write their own ETC watching code but then one of our first slides was we're not locked in the ETC clearly if we go start doing that we are if you're going to say seven different things in the organization need changes they're going to take six months to coordinate it's really important that we make that ETC decision like up front and make it stable it's the kind of thing that we only have to change every five years and we know what's going to work for us that's our API either our API is the ETC or it's not we need an API that works that will integrate well and it is an API we can use yeah honestly I think that ETC is what is getting fractional in this way in general Kubernetes is using it it has just been developed a lot now we have version 3 which we don't use already which is still in version 2 but version 3 that is a lot of parts that are better than before so we're done, time to write I just wanted to say that we've got students it's API it's so simple with cat food watch that really like the API is even watch simple because we were just talking about it's simple the semantics of the API are not what that actually does I just want to say even if it's not the right way to go it should be fairly swapping and it's like whatever thank you for your attention thank you