 I've prepared a few things just to know with you because I've had a discussion with Bremer as to what it's all about. So I'm going to just run through things that are in the upcoming release series. Because if you've been asked questions, then you'll answer them. They often do. We already have that. It's not really the same release series yet. So in no particular order, the support for a newer version of Unicode, Unicode 8 instead of 5.2, which is really stable release series. The current policy is to stay in the stable release series because it doesn't affect the generation terms and you want your queries to match your terms. C plus plus 11 is now a requirement to build again. There's not much C plus plus 11. It speaks to the sport and the API yet. But now it's very easy to add. Very obvious now, smaller, faster. And a few cases where the order n squared, I mean, R, should be respected. There's explicit support for geospatial state, geospatial searching, so limiting the search of a geographical area, finding things within a certain distance or something. Database checking now gives less scripting errors as opposed to API as a fixed mode. So if you have a broken database, there's some things that can happen around, although fairly limited number. It can mostly fix. If you run BTRS on your server and it crashes, it tends to lose small files or empty them. Fix that. The new stem is for Omini, VASC, and for LAMP and Omini Fish. The support for giving pre-fetching to a database, for giving search links to a cold database, it can see which blocks it needs pre-fetched or can really get going. So the pre-fetching to those can speak in search of what's good for them. There's new waiting schemes, TF IDF, some deep, some divergence and some randomness schemes, which is a math problem. That's a language problem, which doesn't matter. Replication changes in our compressed on-disk and there's a class that's generating snippets, which is the little search links. There's the API for opening databases that's being reworked as an alphabet based rather than having a different method for opening a database for a more optimizations match between start to searches. There's a new VASC, which is a binary part of the field process. So you can say... So it looks at text that starts with some text in a cold line and it asks the rest of it to field process and it's looked up to that. So you can set up a Rop13 column or something and have a search about text, Rop13. But lots of things you'd like to do with that, too. Yeah, I think particularly like data processing. Yeah. So it's not for keeping the finalize, but it's actually how that's valued. I've already been holding that. It's value ranges allow you to say something dot dot something that turns into a range, which is obviously kind of similar because your value range is often how there's something that you can call on three, but some of them... Just trying to close up. ...that probably wants to be used. Yeah, I would just go... There's a new backing for VASC, which has... The old backing would store which blocks when you're using a bitmap. And there's bitmaps or during those base files. So each time you're using this commit, you need to write a base file, you need to pay something for bitmap for the whole database. And that's potentially quite a lot of data. Now it just stores free lists, and free lists are stored in the blocks of the database, which have been unused. So it's actually taken up no extra space. And there's just one small file of the database, which says whether we've got free containers when free lists start. So I don't know how you want to take questions now, or... I mean... I've got a couple more things to talk about. Yeah, I'll take questions after that. There's a wildcard that should work fast. The last reference counts as cursors. And one of the services in a wildcard is that each term of the wildcard expands to create a cursor. So by sharing blocks, most of them, all the blocks start to possibly be the same. And there's also an operator for wildcards, so they're expanded later in the process. And again, that makes more efficient. And that's it on me, the C++ API. They're binding to be a platform-free support, PHP iterators, PHP typings, and Java, and now generated by a suite, which means the game can actually be played more. And that's probably... Don't worry if you didn't take all of that here. Yeah, there was a lot. So... So one thing that we talked about... We really didn't want to do that. We wanted to know if there was anything on the ground. Or if there was anything on the ground. Oh! We didn't want to do that. OK. I don't really know about the ground. There used to be some sort of... Yeah, good experience. Yeah, OK. So you mentioned that you're compatible to come in place. Is that part of this development series, or it's not been done yet? I'd quite like to get it done. So at the moment that you compare today's base, it essentially makes a copy, not just copies each team, and rewrites it, so there's no wasted space. So what I'd like to do is to be able to write today's base back into the same directory in place, which is much easier with Glass because it's cracking on the single file point, all the table names as well. So we can write some tables into different names. I guess the other user request that we've had that seems like quite a bit of work to deal with on the milk launch side is Accent Agnostic Search. So you probably have discussed this with various people, too. Some of the standards already do that. OK. So this is some extent language specific. Right. So when do you have to decide this at index time? If you're setting a step up, then you're setting it at index time and you're also setting a search time. OK. For example, there's two variants, the German step up one of which normalizes all routes, which doesn't. So for us, unfortunately, we're American and it's all English. Yeah. Or it's a mix of English and other languages. I mean, if you're predominantly English, probably what you want to do is to strip all the accents because you don't really understand them. Yeah. That's right. So the English, does the English stemmer? The English stemmer students English text, but you could quite easily strip accents. Yeah. Can you use that part to write? You can write your own stem of bodies. OK. OK. There's a library called Unac. It's quite popular. It would be strange if I'm the only one that has questions, but that's all. With something else, we discussed it the other day, but it doesn't have that database modified area. Yeah. The general concurrency model and snapshot isolation. Realistically, it's actually not likely to move away from single writer any time soon because that's very hard to make a change. But at the moment, the way it works is you can have any number of readers in one writer, but when a writer commits a revision, the reader sees it as still valid for revision and reading. And it may hit a block that's been overridden by the writer. The US database modified area. And the idea is at that point you catch that and we kind of search. To be honest, it's a page in the house. I mean, nobody really wants to have to do that. So, the way Glass is headed is that it will allow the reader to lock for revision their own. And so what you'll have is revisions being kept around. So it's a reader snapshot. It's actually like a PCC. Yeah, that works pretty well. I mean, the downside of it is that readers will have to lock. And that's a bit of an overhead. You mean in a sense that they'll just keep the old data around until they're finished, right? Well, they'll have to take out a lock on the files. They'll have to call FCMTL. Because the writer needs to know what it is. So the writer is the one in charge of discarding it. It's not like a GC later or something. No, because there's no server. Well, I mean, yeah, you could have a function. You could have a function that could be a GC that you can steal it every now and then. But yeah, I understand. The writer is the one that has to know it can't do that right now. So, does the issue that a reader can store just end up with a very old version of a loon in size? Or somebody might want to do that. Yes, I mean, it's not possible to tell use from accident from intent. I mean, but both of them, yeah. So there's some things that may not be easier. One is that so the semantics of FCMTL locking is slightly insane for historical reasons. And if you close any file description in process on a locked file the lock is discovered. Which means that if something else in your process opens and closes the lock file which people have managed to do this not much by having their database directly in their mail store and the proper the crawling of the mail store then reads the lock file to see if it's a mail and closes it again and that breaks your lock. So what's happened to them for a long time is it spawned a child process solely to hold the lock. So as well as the locking head you get a rather greater overhead of a fork. I mean, you might have noticed if you look in the process list that you have cat processes or a sub-process. So what it does is it forks and then it execs cat because cat has exactly the semantics we want. It will simply listen forever and if you send it in the file it lacks it. So there's a relatively new in Linux description locks. They have the semantics which anybody who hasn't actually read my page in detail assumes that etsy into the locks will happen. It's been past positive to summarisation and there's at least talk about how to read the etsy in that role in the systems and that's been supported by the stable release series 2 political. So that means that reader locks will be less expensive than it would have been to do that. Do you remember what the current state though is that that means also implications were in at first. Because there's the whole dark marking issue. So I wouldn't generally recommend you run it. Right, that was just very good. I'm happy with don't do that. Yeah, so if you run lockd if you rescue etsy until it works I don't know how many locks lockd work. I suspect probably they fail to do all the banks of creating a child process. But I've never been entirely sold on an etsy lockd being on proof. Sure. And if you want to run it on a different server you'd probably better run it on a protocol. But then the thing about the free list implementation is all we need to do is kind of put a barrier in the free list and say don't release past this point to pieces of locks which are released in this relation and so forth. So it says we just need to make a relation to past which the free list release should go and we have that. So at this point it's not a lot more work but it's slightly better to clean it. A related more general question is seven is C++ crossing this boundary between C++ and C which is our fault I guess. But it's also kind of the only same way to provide shared libraries okay one could debate in another location whether that's a fair statement. I'm not sure this is a good point but I'm just trying to disagree with you on that. Right, okay so let's say that. In general I mean a statement like C++ Right, so let's suppose that it's not totally insane to want to do this. Make a weaker statement. Then error handling via exceptions is really hard I mean so we have a C++ exception that turns into not necessarily the best handled C error in the world but we're working on it and then gets passed to Python or Ruby which throws a language, a binding specific exception. I don't know, I mean it's not your problem but it's a problem for us I guess that in this where we're gluing we're gluing type into bindings and I guess there's probably ways to C which are cleaner I don't know. I mean you could just do it by binding something else so if you want to apply it for non-match essentially you'd rewrite the parts of the non-match C++ which is a small work but then you get a slightly it was a better semantic match. Maybe. I think part of the interesting thing for us has been the many different ways that people have used live-not-match and by providing a C library kind of can't make that library go away now. Well, we could but people, various projects would die if we did. I think one other option would be you have to do it but if it's generally useful to see that everybody can use but as C returns out of use I'm not sure it does the right thing I'm not saying we should do that Yeah, I don't know, that's probably so that's a project they need to deal with it from C if there are others. Are there others? Well, I don't know how MU has written it. I guess it's also in C For some of my language bindings people try to write C is definitely at least coming to that lowest coming to that I mean that's the kind of I was proposing that you had to do it but somebody might want to I'll throw that up there in case somebody in the world has a great idea I'm not the biggest fan of exceptions in the world because there are a bit of action at a distance we're changing it any error handling is inherently extra stuff that just uses the issue and it's just a matter of where you push that rather than sort of some magic solution I think there's a magic solution that's going to be a different way of thinking about it rather than I think it's nice and consistent that you get exceptions out for everything rather than exceptions out for some things that are echoed scrubbers that could potentially be done at a a grapple level Right, right, so I mean I wouldn't want a mix but all error codes would somehow be helpful I mean this is the kind of thing that you can handle with that I'm not saying it's easier or anything but if there were some kind of specification for the functions which exceptions you might be able to automatically generate and how those exceptions would map to error codes but I'm sure there's a lot of attendant data that would need to be captured that you could not make in the interface enough to know Unfortunately there's really a very good way to find out what pros work exactly because Yeah, that's why I do like kind of because of how exceptions work Right, that's why I wonder the thing I like about error codes is because it's more likely that everybody's going to document these are the code or suppose these are the code values that could ever come out of this function and exceptions I think are I mean I use them too in various ways but they encourage you know the, well I don't know, let's see what happens I went before and tried to add some low-except specifications I think it's surprising how few things actually cannot throw an exception because there are things like um bad data which means I couldn't have the same memory what was the anything that could come down to I don't know much about what the bindings do is to refer a standard error handler around everything or everything, it doesn't get marked in no sense and then what it does is see if it's possible to call a function from your error handler and then re-throw the error exception in the function so you don't have to have all that repetitive error and then you can just do one place which reduces the size of the code generated by the error Oh I think so you generate a wrapper without sources, some of the error handling what was all the error, basically the exception handle just calls function with no exception objects so one could generate C bindings in the same way as one generates Python what was actually work on producing a SWIC back end produced C and then the SWIC is I mean it's cool but I don't want to have a look at that code yeah that's very true the nice thing about SWIC is that it automates the process once you've told it how to wrap up in time it doesn't have to do that which means that you don't tend to spend the entire time writing very repetitive stuff just to keep up with API changes but the downside is that lots of spot hideous and many ways but I mean essentially what it really is doing when you come down is automating writing the code you would have had to write and there's not a lot of I just think I really I guess the other pain point for not much users is indexing speed and I don't know that's kind of a big complaint but I mean most of not much is really fast even on really large database hey don't look at me Rob is our official ain't nothing more I got in SSD it's super size my mail store but it's the documentary but indexing is expensive it's been expanded on the search side so I'm indexing I am currently working on various indexing things beginning a while I mean some of it was when I made the SSD then I made a big difference but I got the feeling that I can't really say any kind of anything around it but it felt almost like there was some interaction where exactly whatever not much was doing was not it and the kernel were not very happy about each other or the file system I don't know and I was on Lux which you know it was a fling scripted file system on a spinning disk I've gotten I've heard in various places that I've got an impression that maybe there's something wrong there or something odd it seems to be a case in the kernel it's quite unhappy if you if you F-sync write a bit of data F-sync write a bit of data F-sync write a bit of data it triggers a really kind of unhappy behavior from it is that needed for I mean you do that for a reason I guess for atomicity or it's just historical it well so when the base file is committed you want to make sure the base file is there so it writes the base file F-sync so it re-names over so it doesn't what it doesn't need to do is to do it's in quite a way what it can do is write F-sync I attempted to remove the new writes between F-syncs there's still a sum but they're more grouped than we were with glass it should be better because it doesn't have all the base files to F-sync it just has tables and long file in fact when I get home we're going to be working on a single file version of the back end because somebody wants maybe how you better in this regard yeah I mean it wants to be convenience rather than anything else but benchmarking is super hard and our performance testing suite is really inadequate so it's more a question of what data is freely available than what models people's workloads will so it may be also that there are things that can be tuned on the not much certainly a lot in the indexing kind of workflow but it's very sensible in terms of I mean it just does maps to represent changes in memory where as it could be batching about more as it goes ready to just write out this which will save lots of memory and it controls when tools change just by counting the number of documents changed for everything in no regard for how big those documents are so you know if you're indexing Twitter then the digital threshold is 10,000 I think which is probably a bit expensive but 10,000 tweets and 10,000 different so really it's also taking more of the counts of how much data is using for that does not much use fmincor do you use not digging it well digging it too but do we use, does not much much use fmincor and this is the whole thing I just went through and buff and if you're not in it if you decide to talk to me because there's errors there the way it says it works and the way the limits of kernel actually does is not the same but the idea of this is that you, when you're doing a traversal like we're doing indexing and then saving and the back thing you use this to advise the kernel that you're not going to use this thing again right but the problem is that the normal call that you would use like posits at advise does not do what you want and does not work right like you read the main page what it says it's going to do and what you would guess is not what really happens because one of them there's the no reuse flag and then there's another one of them actually doesn't do anything it's a no off and on external the other one always ejects it like you say I'm not going to use this again it just throws it out like nobody's going to use this again which is not probably what you meant but you can use fmincor when you start up to see how much of the data for this file is in rdn memory and then once you've finished traversing it only force out the pages that weren't there when you started up and that's about the best you can do right now and that can provide a pretty decent performance benefit because it reduces the memory pressure because otherwise you're pulling all of that into the buffer cache even though in indexing you're probably not going to hit it again so yeah it leaves that memory free for zpn or whoever else so I'm not saying do it but I don't want to take it all the time but it might be interesting but this was a simple device then in gmail to stop it keeping the recorder things read instead of database but as you say problem is it what it means is it's a thing you've read going to be hashed at all the memcor trick I'll hold it up okay we'll just have there are no questions for me you know for a bot this was a lot of questions I need to have is there any other questions oh it might be worth like I don't know where all this stands but there was for not much there was the work that was done on custom processor for not much but then it sounded like you had said there were some features some of that maybe ought to go into zpn proper or could be one of zpn proper I just wondered if there was anything because at some point I might look around there I'm not promising so I was curious I mean there's been some talk not much I think I suffered a little cold this week there's been talking not much for a while about custom very vast of the sporting various things I don't know I kind of feel that there isn't anything not much once which shouldn't really be possible to do exactly I mean it isn't possible at the moment but it always affects the enough to do that and so and so extending exactly in some parts of sport these things would be better seems to be a better approach to me there's been different benefits so that was what I was trying to say and I said that really but the problem is that not which seems to be quite conservative about what that's happening which to some extent is reasonable we want to be able to run in deviant stable to bring it back to deviant if it works and deviant stable it works better if you have a new version I think that's fine and I think part of what we have over the last couple releases accomplished is having a much more fine brain view of what sort of features are given not much configuration support so this makes it we may be in a better position to do that but yeah I mean I think graceful degradation with older older versions totally makes sense to me and we have there's a configuration we can do these things yeah and I also think given the history of lots of talk but not actually a lot of code that doing things incrementally in zapien makes sense in the sense that here's a feature that we'd like in not much rather than the grand new grand new query parcel it's still easier to get the feature faster than the team yeah so in terms of contributing back to that it's well past time we haven't let new stable release series yes it seems to be basically I'm very busy and release management isn't something we do in baseball so I'll do that next week but I'm definitely trying to push towards it I'm hoping this year where's the best place to get started in zapien do you want to start is that I haven't even looked at the documentation yet so presumably the documentation so yeah it's quite a good it's quite a good kind of book to have several documentation which really ought to replace most of the documentation it's great that's definitely better than most of the existing documentation is there anything else anyone want to I think we have five minutes yeah I suggest we wrap up and see if there's copies thanks a lot thanks for coming thanks for