 No taker, all right. So it's time? 1.30. 1.30? Okay, so it's about to start now. You are ready? Okay. Thank you very much for everybody at the ring. I have some slides, but I don't want to focus too much on them. If nobody participates, I will speak on my own, and I prefer this to be a both ways conversation, because after all, this is for the community. I don't know if you understand me. So this is for you, please. Is there any therapy at any time? Ask questions. Also online. Can you please have one for me? Maybe. Maybe. I don't know. I don't want to talk about this, so I don't want to discuss it at all. If you have any questions, they are also free to use. So, unless you have any questions, you can leave here. You are supposed to leave for just now last minutes. What's this? How can I use it? How can I get better on my job, on my volunteer capacity, on the things I want to do, and I will also take the opportunity to... I will take the opportunity to try to listen to you more so that you can give us feedback on what should we not be knowing, what should we continuing, what should we changing, or the things that we do. And I will also announce this exclusive here of something we have been working on the last months of improving the LabsDB service. So, as I said, I don't want to focus too much on the slides, so I'm just starting the question, which is everybody here knows if I say LabsDB what I am talking about? Yes? No? No? Anyone else? So, as I think most of the people now are not doing very well, which is we offer a database service, several database service for almost everybody or everybody to be able to access a place where to store its own data for its own application development and probably the most famous service of all of them is their replica service which is basically we have a copy a real-time copy of production that is sanitized, obviously. There is private data that we cannot make fully public but once it is sanitized we offer for almost everybody to access and query them and create tools, program use it in any way in any way they can I say almost everybody because it is not just going to a web page to have database access you need to request a tools account I have not extended on the explanation because I am not part of the class but if you ask the people in class you ask these people around and you want to create an account there they will say, I hope, happy to help but once you have access to our service you will automatically automatically you will automatically get a user a password in order to access both the replica service and the standalone MySQL service there are other databases like Postgres etc but I will focus on MySQL we know that sometimes it is difficult if you are not a technically oriented user just in the command line clients can be a bit difficult so to make it easier to query this replica service so you can go there you only need an account the account anything that identifies as a user and you can almost automatically run queries so I am telling you a lot of new things or this is something that you should more or less know yes, no I want to fast, I want to slow ok so that's more or less things that are new to should more or less know and I I have not been here as a staffer for a lot of time by the way, I am a pilot I am one of the DVAs working on the web media foundation and supporting these kind of services and I have seen a lot of use cases there are media wiki developers that test their code on these replica services before putting it into production and they say it is helpful in order to get an idea of how much queries would take or testing queries I seen a lot of researchers and I want a lot of questions today both research and analytics saying we want more, we want better because they are very interesting on analyzing the wiki another thing that I know you just got hired and that is the thing that I know killing this tool which is if you don't have an MPA call in order to do exactly what you want you can build it you won't be able to query the production database but you can query the most similar thing it's here, I will share this so we don't need to do that I will share this if you are still wondering what kind of things you will find on the replica services as I said you will have almost everything except some fields some rows taken out because they contain no public data I will put some examples here things like things like nodes when they work here the edits account names because you can put a media that is public page properties that is public so you can see a lot of tools using meta data such as sweets templates are used on this page which images are on this page and you can query the tables under image meta data for example one of the top questions that I read from new users is okay I have access to this database how do I use it and the first thing is as you should know it's all to the media wiki an official reference because in theory almost that is this on the media wiki deployment to be replicated into labs so using the same on the media wiki you will be able to see how do I query for users or commons how do I query is that all clear that's all so another thing that I read a lot of questions about why is this not on labs private data I assume that's understandable there's a filtering process relatively complex with the process of anything that shouldn't be public that as a consequence makes working with the replica service a bit more complex than probably it should we would love to give you raw access to the database but because of how media wiki works sometimes that filtering is not in media and some queries that should be very simple they get a bit more complex because how this filtering is done I will talk about that later yes I don't know by heart I will have to check what was there before also I'm not a media wiki expert so I know there's these hidden revisions which is different from the little ones so that's the part another developer I don't know if that was clear so if there is a media wiki please ask them to media wiki experts so another thing that I get asked is people see like oh we have all metadata we have the revision ideas and the revision comments but where is the wiki text that's not part of the replica service so if you want to get the actual wiki text you won't be able to query directly from the replica because it's 20 terabytes of data and it will be almost impossible for us to fit there which revisions are hidden which revisions are not so the way to solve that problem is you work with your metadata once you need the text, the parts or before parts it you go and do an API call I don't know if I know it's not here but ask our API expert you have an API call which is get me these texts parts or before parts and you will be able to combine the API and the database data there are centered parts of the wiki media infrastructure that are also not part of the replica in this case we are open for maybe changing the future but because of the internal server topology things like flow echo echo is complex because it's private in some cases it can be private communication with the user so it's difficult and now it's a complete separate set of servers so it's not currently replicated that is not an absolute that may change the future I need your feedback to know we should invest time on trying to get that or it's not worth it also anything that we do I have to also pass it first to security and the maintainer and they have to read your case and then leave out too so it takes some time to get more information and of course the media wiki is very flexible some things that on the standard media wiki are on the database they are not on the database on the WMF installation so I think like jobs or they are not they are not in the database they cannot magically appear on labs that's why they are not there but it's very confusing sometimes when we do documentation oh this is the jobs but I get no jobs because the jobs in our case they are on a separate infrastructure well I want to go over this part because I'm more interested on when you should not yes the labs replica once you start using it people start using it in a very different way and sometimes maybe not in the most efficient way and that's something I would like to comment very briefly because and this is very you can't disable it I'm not saying that I'm telling you you should not do this I think in some cases it may make sense to still use the replica service there are alternatives for example because I think some users complain that it's not as stable as it should but I saw that there are many tools that continuously ask for we need the latest edits and a relational database is not optimized for a push model so you have to continuously pull the latest changes and it may not be the most efficient way if you have an alternative method of being self your changes with IRC or your methods probably you will learn this component you can then from those changes that you receive you can then go to the database metadata but not yes that's exactly what I mean so keep track of I wasn't trying to but the users it's true that we have not maintained this as probably we should another thing that I saw usually this is not veteran users but very new users can say oh it's a database I have everything here ok let's count how many revisions we have select asterisk from revisions and wait me 1000 million revisions rows not me I have a process 139 and that will not finish so first do you really need to read absolutely all revisions and secondly if you need it may be even more efficient to take the the dumps the exports with absolutely I don't know how many gigabytes getting it in fact in labs it's very easily accessible from inside the labs infrastructure so you can still continue using the labs infrastructure getting the dumps and really need as well it's a set of XMLs I don't know the different formats I get an expert on the dumps thing it's very old instead of here you can ask him about the dumps and process them if you are going to do very large processing you can put it on the dumps and it's almost going to be more efficient and when I'm saying all those things I'm not thinking of please don't overload the database I'm thinking of it is going to be even faster for you to process that way to do it one by one failing small challenges it's going to be way faster in India or another thing that I saw is that once you query the database it's almost easier to do everything with the database but what's the difference with the API code which probably we have spent hours and hours and with we developers, volunteers contributors you're probably not going to do a better query than one that is done in production that has like 20 servers backing that query instead of the top of that we have so probably it's going to be more efficient to call the API than four certain tasks than trying to do redo the same just because of resources there is more resources in production than the ones that don't have for labs I selected three things which I found like typical mistakes if you do it don't feel offended but I would like once we go over this open a round of questions of things oh, am I doing this how should I do this I'm going to give at least some examples of things that again it's very relative but usually it's not the good idea from my point of view one thing I have seen not lately but because I work with several people trying to fix this some people were really thirsty on fixing this was consistent connection without connections and that's not something that people may do on purpose sometimes we have a Java driver that by default opens 20 collections and keeps them open forever and if this wasn't your dedicated server I'm not saying that you shouldn't do that normally but as it is a highly shared environment that you are sharing with thousands of other users you are taking over resources that should be shared among the most highest amount of people so try to be as conservative in the model of how you do your tools or your research so that you can open a connection you send your queries you get them as fast as you can and then disconnect and then you can do offline processing and all that I must say I have to force in some cases this so right now I'm on configuration for example if you have idle connections they will be disconnected because you are taking resources memory that other users could use so again we can talk about the current parameters to of course change it strong to slow to short but that's something that probably you should do as a consequence of doing that but not only because of that you should make sure that if you have a web page tool that is constantly needs connecting to the database if your connection fails or your query fails you should make sure for your application to try to reconnect at least one or a couple of times not only because of you can get disconnected if you are idle for some time but because servers can't go down and in the infrastructure side if the database goes down I can say well now where is this other place but unless you disconnect and reconnect that only you can do on the application server try to connect to the same server all the time so in the logic of your application you should try to okay it's not working disconnect maybe you have already tuned your connection and in most cases especially with the new stuff coming from the web you will normally be redirected automatically to a new server this is also important for maintenance maybe the server hasn't gone down but we need to restart it because of security issues because of many reasons and I need to temporarily move connection so I can guarantee the service I cannot guarantee the server you see the difference so this would be under going and I cannot it's much easier if you put the logic on your own application on the last thing I would like to talk about and that's another highly controversial subject it's poorly optimized web when I say that is well you have some cases here queries should be used on index queries should be used on index queries should not read millions of rows you just want to read or return just a few of them because you can probably do the query in a different way or even if you really need to read absolutely all queries on rows you can actually break down and say ok let's apply these conditions on their first 1000 rows and on their next 1000 rows remember don't use if you are familiar with SQL don't try to use limit for this but with limit you still have to go over just do something like limit 1 million 10 as to really read 1 million rows and then it will return the 1 million to the 1 million and 11 so use the index normally on tables or hopefully within some months on tables we'll have in some primary key use the primary key to paginate that also goes for media weekly developers and now this is ok that's nice it's not that I do bad queries on purpose but how do I know if I am doing a good query and typically someone says well just run explain query and you will know how the query goes but explain doesn't work on last why doesn't work? anybody knows? exactly just run the query it starts executing it but just go to the query and then it doesn't need different permissions it's a permission issue one way in order to make sure you don't access private data it's not the only way we have other methods but one layer of that is to make you access to views that query the real language and in order to run explain you have plenty of brands on there underlying things so if you run explain on select you will get an error saying something like you don't have permission to the underlying base data so how do we do it? well there's actually a way we actually close recently from this it's not that one but if you look for explain labs you will find them on fabricator and there's a command which is so explain which has as an argument a thread ID instead of a query you can connect multiple times to the servers you can run the query and while it is running on an excellent connection you can run for example show process list see your own query and the query has an ID and then so explain I don't know if it has a form and the number of the ID connection ID if you want it even easier there's a tool that someone can link has problem from us which it does that for you automatically so you can put the select there and we will bring you the explain some people sometimes ask me but what happens in the select takes only one second and I cannot you know run so if it takes one second the query is okay optimized if we have time or if someone is interested on the round of questions maybe I can tell you more about the explain you are not my family you can get a conclusion that's up to you I will try to shut up us yes and the last thing I want to talk for kind of things is leaving queries running while and I've seen this sometimes in tools that basically have a web page which are a web page and you will figure out some data, some form which execute 30% of the final user number of the tools when they are okay this is not working but the query continues right now on the other lane so I will have a couple of that you might have first try to detect if the user is still there especially if a web base form or something like that so you kill or you close the connection if the clean as might possible if you know that you have this kind of web request queries which you are probably not going to the user is probably not going to take more than 5 minutes waiting there for the thing to refresh you can even auto put a certain amount of I have here some examples for example imagine that something should be fast but sometimes it starts bringing a lot of rows so you can, there is this syntax which is MariaDB only we use MariaDB for labs which is because rows examine a number of rows so you can put there 1 million and if it starts bringing more than 1 million rows the body lets kill in some cases you say no no this should be either a few hundred rows or just a wall you can set the same thing with the variable you don't want to change the body or in 10.1 and I will talk now why this is interesting you can have max statement time either on the query or as a variable and say ok I say that my query should not take more than 10 minutes I put there a limit in seconds and the server will make sure to as you can see I am saying you please put it on yourselves because I don't want to impose a limit on you that in your case maybe no I am sorry this actually needs to take more time because it's a complex query so I leave on you the task of what to impose those and when you stop here and ask if you have questions of things how you should do things or programs you have found while using this things that you would like us to change you may be friendly there are any contributions that you would like or I can show you how to do some of these things so from the ground in Spain 40 minutes 40 minutes 20 minutes 20 minutes 20 minutes and you know already 10 minutes is that something that you consider interesting like yeah in length of explanation what should you exactly what's exactly what should you be using in the API trying to database but you can come up upon like 5,000 or 6,500? Well, how many preserves is reasonable? For an AGI, well, that's, if most old bulls, I think that's a limit of 500, I think it's 500. Yeah. So I can make 100 of them. I mean, there's always, there's not like a black and white instead to those kind of things. The idea is, if with very little work to explain that, it is a lot, if you think, let me give you a complete example, because talking a little bit, maybe something I found very, very deeply, I have these set of pages or these set of images giving the pages where these images are used or the other way around. From these set of pages, they give the images. And you can say images, you can say internal names, you can say what happens? In very large wikis, and wiki, sorry, I was thinking of the database names, like the English wiki commons, wiki images, wiki data, which is, as you can understand, those tables get really, really large, which by the way is another problem which I will try to either fix or advance in a session tomorrow. But in terms of large, they can get very large. So if you try to do very complex queries, the queries may have to examine either the whole table or very large chunks of that table. In some cases, it may not be all, but in some cases, the API may be already optimized. So if you want to make, you can do those batch queries, as give me, there's actually an API actually, but there's an Americanism of, give me the first 500, and then you start a sequence, and you say, OK, remember this API call that I sent you before? Give me the next part of the batch, let it continue. And in that case, yeah, in that case, that's more efficient because it has enough information that it will keep hold hence to the definition of really great parameters to do it based on the index. So in general, they could be more efficient. It is true that in some cases, not everything has to be implemented. On the other side, if it has not been implanted with an API call, probably it wouldn't consume its load on the database. So the general API is easier and faster with the API? Yes, just to do it. But it's not a bomb. It's not that I don't use the databases. If it is simpler or we have a decision, no, look. That way, I can control the conditions and save bandwidth, the number of, there are reasons to use both. In many cases, the API call is usually faster. And many people don't know advanced features of the API. In many cases, someone said to me, oh, but if I ask for the properties of these batches and I want to check for 200 batches, how to do 200 calls? And people don't know that you can ask properties, for example, for many batches at the same time. So it won't have absolutely all options. And that's why we offer that, because you can do absolutely anything. But in some cases, it will be advanced user. So with the manual, I think, yeah, it's really kind of me down with the API. Or in some cases, if I want to do this with the API, the API doesn't allow me. Please, of course, use the database. But if we don't advance on it, I think more things on the API may make sense. Find the bar. I can't learn. I was thinking about moving from the API to the table. Actually, what you're saying is that it's probably better to stay on the API. It's a user. But it's a case of a case of the actual API. Usually, the general difference would be, this will have more flexibility. But be careful, because it also has way less resources. Because this is not production. This is never considered production. So don't think that this will be always how the same app time as the official WP API. Because that's supposed to have 9.9 something. Up 9, I cannot guarantee that. So you must know this. Or maybe at some point, don't think user strategy is at the same time. And it's not really prepared to stay there. So generally, if you're concerned, it's really that you see just too much. If we haven't come and found you, it's not too much. Yeah. And there's another reason why API is in many cases superior, which is that we have a recent case with someone who was complaining. Let me see. It is not where this is going to be, but complain. Hey, you have changed the data in schema. And I'm sorry, but the database schema on apps will be the database schema on production. It will be a problem. And it won't do. We will now have the guarantee that it will be stable. The API, in order to get it, it takes one year in advance to say this call is really not being used. Warning, this will be deprecated with laps from one year to another. The call is gone, the call is wrong. So you have to think that, because I give absolutely no guarantee that the schema will stay as it is. And in fact, there are several changes that have been made for in order to get better performance in production. This will just matter. So you have to wait the development time and then it's or you should be aware of it. So explain. Let me try to, maybe I have some slides. Let me write you some slides that I have. Explain, explain. Explain is a very powerful tool. It's not the only tool. I will show you a very brief overview of what you can get with this plane and basically how you can know, how to optimize better at queries. And when to know, yeah, this query is not really very efficient, let's try something else. It takes some time. So it's a crash course, OK? No, not much time. This is actually, I think it's in commons. I will share out these links somewhere. I think I will be able to send you a link. I will post the links to the presentation I use and to this. So the full documentation is here. And I really, really recommend it to have a look at it. It's a single page with lots of information. But very quickly, this is the part that you cannot do in labs. But remember, you can do the trick of, so explain, with the red ID. I have to show you or you can check the manual. Important thing is the output. And you get these are columns. You get one row, third table is done, OK? So if your query spans three tables, because it does either unions or subqueries or things like that, you will get several rows. This is a very simple query. This is based on a sample database from the Dutch word generics, because it's smaller for example. And it has studies from page where page title equals Dutch. We've got several columns. The first one is just a hierarchy number. So one is the first level. You have a subquery to get the two. So you have a subquery of that subquery, you have a three. That's not very relevant. Simple means there's no joins, there's no unions. The page, sorry, the table you're in, type all. This is probably the most important part. If you have type all, it means that you're doing a full table scan. If you do things and people sometimes get surprised, because they do select everything on revision, and they say, oh, it's cool. Revision has thousands of millions of rows. You probably are not returning all of the rows, but with type all, it means that you are reading absolutely all rows. Possible keys, so a list of indexes that the query may be used in order to make the query faster. In this case, no means no index can be used efficiently. I'll go back to this. A key is the upper index or indexes in some cases, which are used to make the query faster. P left, just the number of bytes on the key. Ref is what you are comparing the index to, with some, well, in this case, no, because you're not using an index, but it can be a const, meaning that you are comparing to a constant, a quality constant, or you may be using a column from another thing, if it's a join or something. And rows, this is actually the values. The query does you that is probably going to read 90,000 rows in order to return what is probably just one row, or a couple of rows. What's wrong about this query? Why is it bad? Why is it that long? It's really simple query. What's the index on page? Is it normal that there is no index on page? There's no index on page. That's another typical thing. Many people are usually surprised by this, because, oh, you know, we keep it here, we typically think, in a wiki, people think you do this, give me the idea and everything on the page. But what's actually next? That's another, I saw it first, because it's a typical mistake. Well, we could do two things. If this was our own database, we could break an index on page. That's not out of the question. But if you have suggestions about indexing, I have yet to see how to maintain that. But it's not out of the question to break specific indexes that you may need. The thing is, most queries that you will do in media wiki probably already have a whole index. So what's the actual index? Like, to make something similar to this query that may be faster. You know? Try it. Yeah, but if you are actually, you want to use the title, the index that media wiki uses is name space title. Because there can be several pages with the same title, but with different names. It can be wiki data touch, it can be nothing touch, name space zero. It can be wiki data talk touch, so those two can get more than one page with the same title, but with different names. The way, in indexes, you just have to rewrite your query. Even if you don't know which namespace, you can say namespace, I think that's the name of the panel, in a code data set up. And that's probably going to be faster. So what other types are there? I cannot span too much on this. In this space, what I did is creating an index, specific indexing page, which is page title, there's indexing, there's also a new index. And in this case, can you see the number of rows? It went from 90,000 to just one. This is very efficient. So the two things that you really should check when you can explain is the type. This case is using an index, which is this fake index page title, in order to quickly fetch a row, just by looking at a single row. So the lower this, the better. That's a quick explanation. I will be here for the whole week, so we can even create a non-conference just to do a larger version of this pane. That's up to you. We have requests for that. This is public. This is creative forms. So we can download it, the whole course, and have a look at it. Any more questions before I go to the announcement? I have a question related to that. How can you tell what indices actually exist in the lab circles? That's a really good question. And there are several problems there. First, you are not actually putting the table, but the view which may have some extra filters, which will mess with your actual query. So you are filtering deleted revisions. The view will automatically insert deleted equals here. So that will mess up with that. The general answer is labs is just a copy of production. So go to media-weekly, to the media-weekly tables.sql, which is on media-weekly code. And it should be like that. If it is not, it's about. The other way is I think there's an information underscore schema underscore p, which should have the actual schema of the table as it is on labs, in the case that it's different from production. It's a copy of the actual information schema database that we recommend not to use, because it slows down the whole database. So in information schema underscore p, you have the list of tables, the list of columns, and I think there's an index table. So it's a table with metadata of the tables. Yes? So media-weekly.org slash media-weekly manual, called page table or revision table, it tells us that it has an index instruction that shows this piece on the production table, if you see the type. I can't include that. I can't put that on. Probably on the etherpad, I will have all the resources. That's the best place to go in general. The thing is, labs to be at least the old one, the current one, had problems, which I showed here. So there may be not as well an index. It's not 100% equal to production. The idea is, and I think I've found, five minutes. Thank you very much. We have 20... Ah, 20, okay. Let me go on the announcements, and we can go on another round of questions. So we have a lot of problems. Last year, probably wasn't maintained as well as us, and that's probably my fault. The problem is when I entered, I had to fix it, and it was incredibly hard. In fact, it took me six months to reload and wiki. Yes, and wiki, which is like a fifth of the whole database. It wasn't really painful for all users, because every time I reloaded a table, I created a lab, and I said, well, it's worth it. In the end, we'll have a table, and we'll do with the right data and all that. Five minutes after I finished the whole report, the table's broke again, and the issue was not that it had existing problems, it's that the replication model from production to labs, because of the filtering and some media wiki particularities, every time that you did operations like deleting revisions or pages or recovering revisions or pages or images, the whole database went out of sync. What is worse, whenever labs didn't crash, one of the servers, the whole thing went out of sync, because we weren't using a secure method of maintaining consistency. So that's one thing. We had data problems. Second thing, we had, I don't know if there were five-year-olds, but very old servers. Not only that, full disclosure, there was no high availability. We used to have three servers. The storage of one server went down, and the whole server went down. So we ended up with the current situation, in which we have two servers, which at any time, the storage has failed, and we'll start having one server, and then no servers. Can you see there? Can you follow me on the problems? The data tree I mentioned, I have a lot of requests, hey, I really need to do nine-hour queries, I really need it, because I really need to read absolutely all rows, or not all rows, but no, good complex queries that really were doing something with, it's not that they were unoptimized, they really needed long running. And because of the limited capacity, even more limited rich servers, and the memory limitations, when we went over a certain amount of queries running at the same time, the whole server crashed. So I had to put a really hard limitation on the number of queries running at the same time in order to keep the service out. More easiest. It was hard to use, but not only for users, but also for me to manage every time that the thing crashed and the data was corrupted, it was almost impossible to reload it from production, because it was using WDB as the admin while production was using InnoDB, so I cannot just copy and delete that, not only because of WDB, but because it's in a completely different format. And then sometimes applications went crazy and brought the service down, creating even more problems. And also, if, for example, when the server went down, the first server went down for the first of the three, we had to manually, and I think Andrew was there, we had to manually point in any way we could to the luxury service because there was no automatic procedure in order to, okay, it crashes. No problem with automatic. That didn't exist. You can see quite a problematic environment for the really, really interesting use cases. So, I went to, my manager said, what do you send us? He said, no. I called, I'm kidding. I'm, we actually managed to get a really, really nice set of servers, in fact, and correct me if I'm wrong, I think they're the most powerful servers we have on our own infrastructure. So, we are not using those for the InnoDB production, we're using for you, the community to use them, and hopefully not break it because they have all the power I need and a couple of CPUs per server, half a terabyte of RAM, enough for low running queries, SSDs on the right end, so I think really, really nice hardware. And on top of that, two servers to do the filtering before they even get to last. So, remember I said, in the last session, MariaDB 10.1, we really believe that it's a, it's one to be, I hope, an easy transition because it's backwards compatible, but it will give a lot of features that will make life easier, in particular with these, we were able to do row-based application with all data differences and production will be won. Or at least, we have been testing this for a couple of months and let me find out, we have found no data differences. In fact, we still have now issues on the current lab servers that have not happened on the new servers. So, exact, same data from production and labs, except obviously the, we have set up, completely transparent and automatic, except the thing I mentioned about reconnecting from the application, but if some of the servers go down, I don't have to actually do everything. So, it should increase the availability of the latest labs. Row-based application, for guaranteeing consistency, you don't know what row-based application is, please tell me, I can explain. We had problems with TokuDB for our specific use case. It's a really nice engine, but by having InnoDB, which is the same with having production, even if it goes down, we can copy from production much, much easier. In hours, not in weeks. So, as you can see, it's really small improvements, but a lot of them, which I really think will make the service much, much better for use cases. Current state, we have loaded S1 and S3 and they're currently available. So, you can already, you should be able to access these two charts. It may seem like a little, but this is 800 wikis out of 900. So, it's, most of them, the only things that are missing are large ones like Comments, Journal Wikipedia, French Wikipedia, if you can see the documentation of what are missing. This is currently, we obviously will have, I just want to say in a few weeks, I'm not sure when, but very, very soon all charts loaded and available for quaring. We have created some scripts that data is as it should, and there is no data out there and as I said, much easier to data get lost. How, this is just a new announcement. We have not yet fully documented this. We will send an announcement to the launch list in order, we want to create the documentation first. But if you want to test these already in an undocumented way, there should be, you can just use the same credentials that you are currently using for accessing the current last database. So, this is right now opting, so nothing has changed, what works now should work as we have added a couple of a couple of these temporary hosts which are lab and temporary hosts and lab analytics. This is all domain names, you can write it on it but hopefully those will be next. These are not definitive, this just for testing oh, how does this work? Is it working? What does it have? Oh, this is broken, I want to report about before it goes really, really on the definitive place. For now, it will be opting so that you have time to test it. Yes, it works. No, it doesn't work. And it's not 100% clear how we will put this final state but we want to do it very slowly so that we don't break existing labs. Right now, there's absolutely no difference between these two URLs except that they point to different servers as you may assume from the names. The initial idea and this is the point in which these two reserve one domain for short but very real time very highly high throughput on the other one for long running queries which may be limited throughput. So you can maybe run just one or two queries to the second one. You can run multiple ones but they have to finish their queries. It's very working progress because we don't know this is something new. We don't know how is it going to be used but we are open to discussion at this point. This is something we have just set up and we can evolve in any way you think. I hope that's something that we can do. So that's the announcement. Good news. Your resources that they were long term wanted and I'm open to lots of questions but these are my questions. I have seen many people doing same queries the same queries once and again. Maybe we can recreate those if they are common sending bugs or future requests if you want. Maybe more indexes that I mentioned before. Resource limits. Give me feedback on how those resource limits because we want everybody to be able to query but we don't want a specific person to overload the server which are the right combination of flexibility of the server. One thing we are not very sure about is user databases. Right now these services are purely read only. We want to talk to the people that are currently have user databases inside of their data service in order to work with them and see what are the best steps to go forward. Either provide a send service or do the same in a slightly different way. I think this is just a few doses of users not most of them but we as you can see it's a very rapidly new idea although we have been working for months on this thanks to Manuel, thanks to Chase, thanks to Joby and the whole LabSteam it's a new era of LabSteam so any questions on that? Yeah, first this is awesome, first we're excited about it. We generated this stuff we've sort of done that for a lot of the analytics metrics that we get asked but based on production data so we took the production data and we crunched it and basically took 800 databases and put them into one table there's a session tomorrow we'll talk about that so you are already providing that service? We have the logic that does that from the production data what we'd love to do is suck in the data from this cluster instead because it's already sanitized and we can share it immediately without worrying about it Talk to me because that's something I really thought about but I didn't know how to go on passing you have data that you cannot register and maybe LabSteam is maybe a way to share that you do it on your own when you make sure it's sanitized we can put the process to push it to LabSteam and make it. We have a good set of process to do this kind of transformation from a complicated schema One thing I am doing now as a test is a watchlist table the watchlist table is something that we cannot share because it contains this user who is watching this page so that's completely better but with what does the functionality of how many users are watching this so I started as a test not even completely finalized created a table every some hour which has the page and the number of users that are watching the page if it is higher than the API limit which is 29 users so that could be generalized in a way of setting a process that does that for watchlist task but you can do it for other process that you may or may not have or less thought about we are very open to those kind of things so in any process it's less resources from every single user to the same thing so anything that we can't do to help that I saw one thousand times improvement in some ways Is there any question from the IRC Challenge nobody seen us this is kind of a sneak peek we will be sending more information and a lot of official announcements this is just for you to know that we are working on it and we would love everybody to have it let's talk about the switch from token pay to new pay is there anything that you can expect to change on that it's going to arrive from depending on what you want from my point of view it's going to be more reliable in terms of it's going to behave exactly as production it is true that in some cases token pay may have certain optimizations for example as it is compressed so heavily certain mass insertions or mass mass selections may be faster but in ODB it has 20 months in a more relational way which is things like range selects or single round selects it depends heavily on the application however even the hardware upgrade we already have first it's going to be more reliable so if you could not query it before it's nice that it was more compressed but you couldn't really use it and second in some cases it's going to be faster because with half a terabyte of memory when you have a lot of memory available in ODB it's generally much faster again it reminds me to give you an idea of performance I think we have been testing in all tests that we have done we are getting around a 5x performance just in latency when we thought it was going to be only in throughput so it looks pretty good only that's why we wanted to test it especially curious ok maybe one last question or we can so UB came to Berkeley it presented the Jupyter Network and from the analytics data science side a lot of people are writing Jupyter Networks a lot are super feedback yes and then the second for user databases one approach might be you make a VM that is a VM image and you just give it to people and then you put use your own disk use your own bandwidth and then use that VM and keep it simple we already have we want to create something that has an influence to be some people want their own tables on the same kind of disk in order to create the VM it makes sense right it creates a lot of problems and all that thank you I like it I didn't have much time to do my Sunday I I I I I I look and you also want to go there yeah most of us no if people do no problem that's not going to be bad we want to after the hard work we want to do some software maintenance to have a now inside this side I provide to the studio but since from now that's not going to be a big deal I'd love to buy and basically I think I can do some things that are on the replicas but in terms of right now it's not going to be a bad deal I don't want to be working the same I would love to have the same kind of or similar kind of hardware but the biggest problem we have is the addition that I need for this the addition that I need is we already have a replica but it's not how to work with that it's a master of the same replica we cannot let users just connect anywhere we were not sure we have feedback in this kind of things because we don't want to change a lot sorry and that this coming is the very good same hardware that will be a smaller brand we know that it will be a nice impression for us that's very good for us we know that it will be very good only because it has a lot of requirements it will be much more like it will guarantee more it will be much more more it will be much more more