 systems, they do essentially the same thing. The difference is you probably already have your data in MySQL. So why would you rip it up, put it somewhere else, then do the calculation, put it back, join it on the stuff? You can do it inside MySQL. There are limitations to that, too. But it's a convenience for a large set of use cases. Think, for instance, also RDF data. So you can now easily import RDF data and actually traverse it easily inside MySQL. That can be kind of nice, perhaps, depending on what you need to do. Social networking, if you need some social networking on your side, the OQ Graph Engine can definitely help you with that. You now get that for free. That's something you would have previously maybe looked at, been in pain with, and disregarded it as impossible. Take three steps back and try again, because those conclusions were based on normal relational systems. And this looks relational, but it does tricks. I'll leave that there. We also provided some patches into our Delta as well as MariaDB, so that's in plus the build system. And so when we contributed that, I mentioned our Delta itself. Monte-Program now pretty much employs all the original optimizer people that used to work at MySQL AB. And I think that's been really important. There's other people who know a lot about the server, but these people have been working with particularly the optimizer environment in the core of MySQL for years. They know stuff that none of us do. And it would have been sad to see them disappear to various companies just trying to make a living. These are important people, and we need to keep them around. And they've been doing quite a bunch of interesting work, and we'll get to that on the next page. There's something called the Open Database Alliance. It's not particularly big right now, but it's a vehicle that the other companies that I mentioned here use to work together and actually provide services. So when you come to OpenQuery and ask for engineering work on the MySQL server core, we're probably not going to touch it ourselves. It depends a bit on what it is. We do know way around the source code, but we're not the experts. So we might ask Montiprogram to do that actual work, but you can come to us and we'll make sure it happens. If someone comes to Montiprogram with needs in terms of remote maintenance and server reviews and tuning and that kind of thing or training, they might send them to us. That's the way that particular thing works. And then one of the latest additions is SkySQL, which is essentially, let's call it the fork of the service division of MySQL AB. The original VP of Services, Ulf Sandberg, and quite a number of the support and training people at MySQL then Sun and Oracle are now in SkySQL offering similar services, I think also at a similar price point still. Maybe cheaper than what Oracle does now, higher than the price point that I'm using. But anyway, there's different providers of those services. Prokona also does training, by the way, but in the US. Now, the interesting thing is that both PBXT and the OQ Graph Engine are, of course, DPR licensed. So in theory, you could just plug them into MySQL, right? Well, in reality, that doesn't quite happen because MySQL, the company, does the builds. There is a pluggable storage engine interface in 5.1 and above. Nasty because, well, in reality nasty because you actually need to compile it with the exact same compiler switches to make it work properly. Now, you can kind of figure that out, and it kind of sometimes works, but you have to also compile it against the exact same version of the source code. It has way more dependencies. There's no simple binary compatibility, and there's no API versioning. Therefore, the only way to make a pluggable engine or another plug-in work is compile it in the same source tree at the same time as you compile everything else, and just leave it there as an SO library, and then install it and use it when you need it, and just leave it out if you don't need it. That's perfectly fine. So it still makes it a plug-in, but it's not something you can acquire from a third party and make work. It is theoretically possible, but in the practice, it's just a pain to maintain. So those things are now integrated into MariaDB from version 5.2 and upwards. And that's really important, because those community contributions are now actually inside MySQL, except we need to call it MariaDB just for trademark purposes. So this is the kind of stuff that you will see in MariaDB 5.2, and it is production ready. People use it in some really, really big Australian companies that we work with use it as well. So there's absolutely no issues there. Of course, there are bugs. There are bugs in every piece of software. Anybody who tells you things are bug-free is lying. We know it, right? We just admit it. Subcreers, I asked this earlier today. Who here uses subcreers? Yes. What's that mean as a join? Absolutely. Yes, that's what you would be doing, because subcreers kind of sucked in MySQL, didn't they? Because they were kind of, yeah, they did suck. And the people who wrote it will readily admit that. They did a start of an implementation, did some stuff, made it work, and then they were sent off to other projects essentially driven by sales. Well, these same people have now finished the job, because they're no longer driven by sales or working for Monty. They fixed it all. Well, nearly all. There's some little things that still need some work, but things like correlated subcreers, which really, really sucked. They just work now. It's just fast. It's beautiful. It's really, really so much faster. It's unbelievable. So do play with that. If you use subcreers, just download the 5.2, see how the same query suddenly works the way you'd expect it to rather than having to wait for a while. So I used to always tell people, OK, if you have a sub query that can be written as a join, please do so. There's no longer any need for that with MariaDB 5.2 and beyond. So that's rather nice. Other longstanding bugs, just lots of little things. It doesn't serve to give you a long list here. There's a community contribution for virtual columns. It's kind of an interesting, interesting little trick. I don't have time to discuss that now. Plugable authentication, MySQL 5.5 also has that. So that's not particularly new. User stats, I'll mention that one in particular, because it can do something really funky that otherwise takes a heck of a lot of effort to do externally. It's a couple of extra statistics tables, which when enabled, because they eat a little bit of CPU power, of course. So they slow down the server a tiny bit. They track which users and originating IPs and tables and there's four different ones. And indexes get used. They're not only available is in show index underscore statistics, but also in information schema and then a table in there with that same information. That enables you to do an interesting query. Through information schema, you can get a result set with all the indexes that exist in your system for particular database. So you could left join or do a sub query on that table and the user stats index statistics table, subtract one from the other, and you can work out which indexes never get used. That's pretty cool. Now, why do you want to remove unused indexes? Any thoughts? Space? Ah, I don't have enough. Absolutely. Any write to that table is delayed by an index that apparently didn't get used. So that's really, really useful. There are possibly other ways to figure this out, but not even the query analyzer can easily do this. They can do it in some cases, but it's quite complex stuff. And there's absolutely no need. MySQL server just knows these things. You just need to track it over time. So you don't leave this on all the time. You just leave this run for a day on one of the slaves, and you figure it out. So that's the kind of stuff that Open Career uses on a regular basis, which is why we really like it when a client says, sure, when we ask them, can we actually please run MarieDB or 4.5.0 DR Delta builds, which also do the same thing. That really, really helps us do our work, that and many other little tricks. So the PBXT engine is built in Federated X. The original Federated Engine at MySQL was built by Patrick Calbraith. And he left the company, and it kind of was a bit abandoned inside. It wasn't a priority in terms of sales and so on. So it kind of became a development sideline. But Pat and another former colleague and friend of mine, Anthony Curtis, they've kept working on it, called it Federated X Engine. So when you now use the Federated Engine in MarieDB, you're actually using the Federated X Engine. So essentially, it has all the bugs from the other one fixed, and it just perplaces the other one. So again, a bit of a fork replacement. OK, Federated Engine is built in. The MySQL key cache can now be partitioned, which can be useful. There's some pluses and minuses, and there's still some work being done. But it's kind of, it's handy. If you don't want some tables to interfere with other tables in terms of caching and getting chucked out of the cache while loading those indexes, that can make sense. There's some MySQL binlog and row-based replication enhancements that I'll just mention. There's lots of that work going on. There's already work being done on, let's call it MarieDB 5.3. But essentially, there's still based on 5.1. There's also work being done, of course, on MySQL 5.5 and incorporating the changes that were done there, like the replication heartbeat, only incorporating those into MarieDB. So at the moment, you kind of have to choose, do you want MarieDB with all those nice changes and extras? Or do you want 5.5, which admittedly also has some cool stuff? At the moment, I choose the MarieDB thing, because I know that MarieDB will get those other things as well, even though it's a couple of months later. Yes. Are there any syntax? No, not right now. At least it's backwards compatible. There can be some new and extras, like the virtual columns, add some extra trickery into the create table syntax. And of course, the OQ graph engine adds a complete new engine, as does PBXT. They don't add extra syntax there. They add extra information schema tables and other things and other capabilities. And in some cases, new server parameters. If you have a config file from a MarieDB, you can't put MySQL back and make it work, because in some cases, you just need to comment things out. Upgrading is easier than downgrading. But that's always the case. At this moment, there are, as far as I'm aware, no binary incompatibilities. So in terms of the data storage, you should be able to move backwards and forwards. There is no particular plan to break that. I mean, it would be not particularly beneficial. However, there's no guarantee that someone in the ecosystem might break something at some point, which makes someone else unhappy. No way of knowing. And I hope that doesn't happen. But it's entirely possible that someone will make a commercial choice to do something that the others then it will really become a fork. At that point, you have to stop the mergers. So I sometimes get asked, how do you hedge your bets at open query? I don't. I just use MarieDB. So I don't do bet hedging. I don't fuss about it, because I know that MarieDB at the moment, at least for the foreseeable future, still picks up all the other changes as well from all the different sources, including Oracle. So it gets the superset. It's just a slightly delayed superset, but I know it's done by people who know what they're doing. And I have their phone number. I can call Monty now and call him out of bed if there was a real problem, and he would answer and fix it. Or at least make an honest attempt and tell me the honest answer. And that's always useful. If he can't help me, he will also honestly say, which is better than you sometimes get with a salesperson somewhere, right? Yes. Well, Owens, who brought out 5.5, yes, that was the corporation Oracle, the people working there. Well, that's one of the things they've purchased. Yes. So when they bought some microsystems, which were the owner of MySQL, the trademark, and the nice dolphin logo, somewhere there, they got that name as well. And that's why when you do something that is more than just a little enhancement, you should call it something else, otherwise you're just in trademark nasty land, which is why it's called MarieDB. And it's still GPL, right? Absolutely. Well, it can't not be. Well, a future version, I mean, they do own all the copyright, because all the contributions that have been made in the past have been essentially copyright assigned to whichever corporation owned the core at that point and then licensed back. But it means that the organization that owned MySQL actually owns the code base, which allowed them to do all that nasty dual licensing stuff. Well, I say it's nasty now. I used to think it was half a good idea. I've grown up. And apart from the growing up, the world has moved on. It no longer makes any sense whatsoever in the current environment. As far as I'm concerned, five years dead, and you just have to make the salespeople stop doing silly things. It makes no sense in the current ecosystem. The thing is, it still makes the money, therefore they won't let it go. It was intended to drop it when MySQL network first came out, which is now MySQL Enterprise. The intention was to taper off the licensing sales because the services and MySQL network subscriptions would take over. However, they both ended up being nice money. And as long as something keeps making money, you don't drop it. That's just one of those things that used to upset me. And it's still kind of annoying on the sideline. I tend to try and ignore it. Other questions? Yes? Well, who is they? I mean, lots of people in the ecosystem are. So just to go back here. Yes, Percona is actively hacking on InnoDB and calls it extraDB. Monte Program is actively working on MariaDB, including those things from Oracle. Oracle itself is working on the MySQL code base, including the InnoDB component. I don't know if you know, but InnoDB was developed by a separate company years ago, Innobase OU in Helsinki by Heki Turi. And that company was bought in 2005, 2006, by Oracle Corporation. They already own the piece, which essentially bought them a seat at the table. I've met Ken Jacobs, who was employee number 18. He's no longer with Oracle now. But I've met him numerous times at MySQL conferences and elsewhere. A great guy, really useful, actually, to MySQL ecosystem, because he had some great input. He's a very wise individual. He has a lot of experience. He's been with Oracle since whenever it was developed. So yeah, all those people are still active. And lots of people in the community are working on things. The cool thing is that because of the way the development model at Monterprogram works, more community contributions can find their way in. MariaDB is pure GPL rather than dual licensed, because Monterprogram doesn't own the copyright to the original code. They have the code base just like I have it on my system under GPL. So whatever they add, well, the bits that they add is, of course, theirs. But when they add the bit from OpenQuery, the OQ graph engine that is mine. PBXT is owned by Paul McCullough and so on, or by PrimeVase, I don't know. Yeah, so all those contributions are out there. And it's now a bundle of GPL stuff owned by different people, just like the Linux kernel is, and all the other modules and other components that are flying out there in GPL land that makes up our Linux boxes to put it correctly. So that's the overview. Here's some resources. Yes, I'll happily answer further questions. 5.5 has been in the works for a bloody long time. Let me put it that way. It used to be called version 6. Then it was called NextGenNG or something. And it was like a continuing tree that never went anywhere. Then there was a 5.6. Then suddenly 5.6 disappeared. Then 5.5 came out in product previews. And then suddenly it, oh, labeled 5.4. And then in the end 5.5 came out, hooray. So I don't know how many years I've actually gone by to make this thing happen. And lots of things fell by the wayside, like the pluggable backup thing that 6 was going to have. It's gone, poof, vanished, bummer. So online backup for all storage engines. That would have been nice. There are other ways of doing it, but that idea wasn't bad tossed out the window somewhere. And there were lots of other interesting things in version 6 that went by. So in terms of versioning, I've kind of lost it really mentally and for business. So I think at the moment, in terms of the number of people involved, as far as I can see, the MariaDB and Wins, because it incorporates things from lots of different people, Oracle has internal development and can accept code contribution, but under very specific conditions. And there are business imperatives why they want to accept certain patches. So it's less open. The monoprogram doesn't care about that. So they happily put in those things. They don't have an interest in not accepting something. In fact, they have a very strong interest in accepting as many things as possible for many people within reasonable sense. I mean, if it does really weird things to the server, that wouldn't make sense. But yeah. Yeah, yeah, absolutely. My business relies on it. Oh, this particular business of mine, yeah. So from that perspective, I guess, yes? Right, which is a couple of walks as a world. Yeah, yeah, absolutely. He's there. Yeah, we're good mates. Now, where's my crystal ball today? Fair enough, but all I can give you is the information. I seem to have left my crystal ball. Yeah, that's the idea. But my crystal ball is as opaque as yours. It doesn't work particularly well for this kind of stuff. I think that was a Drupal call, actually, a Drupal done under a couple of, well, you were there, as in predicting what would be cool in a couple of years and so on. And damn, were people wrong. So yeah, that kind of stuff always happens. I think, and that's why I use MariaDB, because of the interaction I've had with my former colleagues. I know the people. I know what they do and how they work. That MariaDB is a pretty good bit, also because it merges from the Oracle code base. Now, if Monte Programme and MariaDB were to completely disappear, there's still the Oracle code base anyway. And the code is out there. And other people are working on it, like Percona. So I think that code base is a solid thing to have around, which is why we use it now. If it were to disappear, well, if no more new versions came out, that's not a particularly harmful thing right now, because that version works pretty well. Drizzle is getting there. I mean, in the past, it has been strongly recommended you do not try that stuff for production, because they did rip everything to pieces and stuck it back together in a different way and rewrote things. Of course, things will break on purpose. That's taken quite a while, but it's getting along really nicely. They've built a completely new replication infrastructure just to give you an idea of the complete re-engineering that they've done. The code base no longer looks at all similar to MySQL. They have some common heritage, and that's about it. There's some ancient history there. So yeah, for certain purposes, Drizzle will be the better choice. The intent there is particularly suitable for cloud type infrastructure, things with interaction with, for instance, Gearman and that kind of stuff, distributed kind of processing. And it's in part sponsored by Rackspace, which has an interest in that kind of cloud stuff. So yeah, it'll go in that direction. MySQL, of course, has a very large existing user base. So whatever it intends to do, it needs to keep in mind whatever it was doing in the past, which is support, I don't know, a couple of dozen million users installations, whatever. I don't know. I can't hedge bets any particular way. I think various of these players will be around for quite a while. And like I said, I have no reason to believe that MySQL will not be developed by Oracle in the foreseeable future. But I do know that when I talk to an Oracle salesperson, they will be plugging Oracle database, not MySQL database. And that might make an impact in the future on how much money they spend on it. So depending on where the revenue comes from, there might be different decisions. But that's just business imperative. So I'm not actually presenting anything in particular. There was one last question there, and then we need to finish it. Yes? Yeah, that's essentially. Yeah. Yeah. The code is available. There are enough people in the ecosystem and companies available to do stuff that support things that can fix bugs. And if you have an existing app that works, you can keep running that for a long time. We still see people using really, really old versions of MySQL. And should I upgrade? Please don't, because it works now. If it ain't broke, don't fix it. For those specialized environments, just leave it. If you're dealing with a bug, sure, you should look at upgrading. We're talking with someone who knows what has happened since. That's fine. But yeah, essentially, I don't think there's a problem. And I'm not really, I don't think it's really necessary for me to look ahead that far on what will be around in a couple of years. It'd be interesting to look out for. But for business reasons, I don't think it matters that much right now, because I know that the code basis out there and the people are out there. OK. Thank you very much. I'm zone. It's morning, 7 30 AM. I've been up all night. So if I appear a little bit sleepy, well, that's just because of that. The idea for this lightning talk appeared when coming over here. I left from the Netherlands last Saturday. And I made a 12-hour stopover in Dubai. And well, when I go anywhere for a long stopover, I like to go into town. So you have to pass customs. And if you go to a country like Dubai, it's the way you have to pronounce it, I believe, they have lots of restrictions. Restrictions like you can't take in any medication. You can't take in whatever, because it's unlawful, including the data on your computer. They have restrictions on certain kinds of files. And well, that inspired me to create this session, how to efficiently hide sensitive data on your laptop while traveling. The original title was a little bit different, how to hide porn on your laptop while traveling. But I wanted it to make it a little bit more professional. So let's talk about the sensitive data. So the problem, you don't want that some people slide sucks. You don't want to fight some people to find your sensitive data. Well, I hope you understand what I mean. Why do you want to avoid that? Well, because your laptop can get stolen in a train. Or you can pass by customs in some weird foreign country where they want to control what's on your laptop. Yeah, sure. Even more important, you don't even want them to see that it exists. Because if you really have to hide something, an encrypted volume like a Lux encrypted volume or a password protected zip file isn't good enough. Because government will find the encrypted zip file. They will see the encrypted volume and they can just torture you to get the password or passphrase or whatever. So you need something that's better. So I started thinking a little bit. And in my solution, I worked out what I call raw LVM storage based hiding your sensitive data. So why LVM? Well, the good thing about LVM, you have to be good in Linux if you want to be able to find your data on raw LVM storage based. And not that many people are that good in Linux that they would recognize that there's something hidden on their old petition. So how does it work? This is just a hint to a solution. I hope it may lead to something that's even better and you do your own interpretation. If you find something even better, please send me your updates. I'm always interested. What I did, first, you need to free up some space if necessary. If it's LVM, you can use LV resize to resize your existing LVM volume. If you don't have LVM on your laptop, that's no problem. You probably would have raw petitions. You can also resize raw petitions pretty easily. I like to do that from the command line. The structure is quite easy. First, shrink your file system. Second, delete the petition containing the file system. Third, recreate the petition according to the boundaries that you would need to have the appropriate size. So in this talk, I assume you're working with... Make a backup. Do you make backups? Okay, thank you. Another person who talks encrypted to me it's so weird that in Australia, I always keep meeting people that are from the Netherlands. So next, create a new LVM volume. You can use LVCreate. Probably you know how to do that. And last, do not make a file system on it because if you make a file system on it, well, while booting your computer, it's too easy to recognize the file system. So just create an LVM volume and do nothing with it. I like using an LVM volume name like slash def slash your slash swap. So the ignorant person who looks at your laptop would think it's swap space. And he would never suspect that there's any data on that. Now, once you're there, you can copy over your sensitive data. To make it easier to handle, I put it in the zip file and next you can use good friend DD to copy over the zip file. If you have never worked with DD, DD is really the most amazing tool that exists on Linux because you can copy over raw blocks. In this example, DDIF equals slash home slash your slash P dot zip and OF equals slash, well, your LVM volume. And it will just copy the raw blocks to the beginning of the LVM volume. And no one will be able to mount it directly because, hey, there's no files this minute. So then you can travel over the place where you need to be and please don't forget to remove the original file if you really don't want people to find it. And after doing that, once arrived in friendly territory, like a country like Australia, you can use DD again and copy back the zip file. And then you would have full access to your data again. Another nasty typo in my presentation. If you want the updated version, just send me an email. I will probably make an update if someone requests the updated version. Thanks for your attention. This was my lightning talk, number one. The lightning talk I really wanted to do when coming over here is creating a Linux ice-cozy send. This is something I've used in production. This is also something that is related to the talk that Florian Haas has had this afternoon and will have tomorrow or day after tomorrow as well. So what I'm doing, I'm just focusing on the ice-cozy part here, how you can set up an ice-cozy send solution. So first, that's about me, that's boring. Second, why is a send useful? Well, you probably are all aware of that, but if you have a send, you can separate the storage from the server and that means that you can make your storage flexible, which is if the storage is somewhere else and on your server, if your server burns down, you can create another server and still access your storage. Also, sends are quite useful in a high availability environment, especially if you are creating a cluster and on the cluster you want to mount some resources and access data if the resource moves over from one server to the other server. It's useful if your data is on the send and the nodes in the cluster can still access the data. So what is involved to set up the send? First, you need something to share. That can be basically anything, a disk, a partition, a logical volume are amongst the solutions, but you can even create a file, a dummy file, which just allocates some disk space to your send. And next, you need the iSCSI target service. There are three different iSCSI solutions for Linux. What I've been using to prepare this is IET. I think it comes from the Internet Engineering Task Force. I'm not even sure it's a default solution on SUSE Linux and I like SUSE Linux, so I've used IET. Now, how would you need to create it? Well, first you need to install the iSCSI target package and for the IET you need a configuration file which is in IETD.com. In order to set it up, you need two different items. First, a unique name for the target. I would suggest using something short which is easy to recognize and to avoid the UU IDs which normally are created automatically. And next, you need your learn configuration. That is a very complicated configuration which makes up a very long configuration file. This is the entire contents of the configuration file I've created on my demo machine. So on the first line, you see a specification of the target which has the name iqn.2011-01.fr.sunder. colon iSCSI. This is the way how SUSE Linux handles iSCSI target names by default. I think you recognize this part of the name and this is just the inverse DNS name of the machine which the iSCSI target was created. I use iSCSI as a replacement for the UU ID that is normally assigned as a unique ID for this iSCSI target because you don't want to type in numbers like this. Next, you need the specification of the learn that you are going to create. In this case, learn zero path equals which contains the path to the storage device. Type equals file IO and then you launch the iSCSI service and you've got your own iSCSI target. Now, once you have set up your iSCSI target, you can connect to it by using the iSCSI initiator. Now, the iSCSI initiator on Linux is something that's using a nasty command. It's too long for me to remember it, but fortunately the man page is quite okay. Two steps I'm using here, iSCSI ADM minus minus mode discovery. This is the discovery mode that is going to the iSCSI targets to request the iSCSI learns that it offers. So in order to do that, minus minus type send targets and the most important minus minus portal which is the IP address of the iSCSI target server. And next, once you have found out the exact name of the iSCSI targets, you can connect to it by using this command iSCSI ADM minus minus mode, et cetera. If you want a complete command, you can send me an email sander.fr. I'll be happy to send it over. Next, once you have created the connection, you can make it redundant. In order to make it redundant, I can recommend that you use DRBD as your underlying storage device and you use pacemaker to configure the iSCSI target service as a cluster resource. Florian Haas is going to talk about that tomorrow. If you don't want to wait his talk, you can send me an email which is on this page. I have written a paper on how to set up an open source send using iSCSI targets, et cetera. That's like a 30-page paper containing all the details that you might be interested in. So send me an email, I'd be more than happy to send the presentation to you. And that was my second and last talk. Any questions? It does work. I use TGT online. Yeah. Because when I'm one server into the storage I'm not doing exactly what I'm doing now. I use TGT. And the only time it's been down in two years was when the power went out. Yeah. I also have two customers who were brave enough to use this solution and they don't complain about anything. And you save a serious amount of money by implementing a solution like this on your storage infrastructure. It's the iSCSI protocol. And the iSCSI protocol, well, it's sent over IP network. So you have the performance overhead of the packet headers. But apart from that, no, not really. There are really nice tuning parameters for iSCSI. So I tend to think that you can really optimize it to work faster than for most environments. It's not fiber channel, though. Let's be realistic. But if you come from an environment where you think that iSCSI is an acceptable solution, I think this is as good as NetApp. That's an interesting one. I recently created an iSCSI send solution based on this. We ran a few simple performance tests and we didn't notice any significant difference between the NetApp send and the iSCSI send. It's not. The world's fast at three nubile boxes. Another question up there. Yes, sure. You're welcome. I leave my slide on for a minute so that you can complete it. So there should be another one, another person. Yes, another question. Yeah. Are you mean multipath on top of this solution? No, not yet. That's a good test that I should do. Yeah. Are you talking about an iSCSI solution using this concept or using a proprietary send solution? Not completely agree with that. There's multipath d. OK. Any other questions? Did you finish your notes? No reaction? So probably yes. That was all. I have a question. My name's David Gwint. My email address is dlgithopenbsd.org. Oh, sorry. Oh my god, I nearly flipped it in half. That'll be fun. Look at the service that comes from non-. OK. So OpenBSD Scuzzy Mid-Layer was written back in 1992 by some guy who needed a very specific thing, and it went horribly wrong after that. So a couple of things you need to know. Scuzzy is hardware, right? So the environment you're working in is something wants to do a scuzzy transaction. So it pushes it onto the hardware. And ideally, sometime later, you'll get an interrupt with completion. That means that you aren't necessarily going to sit there waiting, spinning in a loop until you get the completion. It's going to happen later, right? While that's happening, you want your computer to be doing something else. So you want to give up control of the CPU and run another process. The other thing you have to keep in mind is modern hardware these days can handle multiple transactions at the same time. So you can put hundreds and even thousands on some controllers, commands outstanding onto the hardware all at once, and then the completions happen like thousands of them later on. The problem with the Scuzzy Mid-Layer was to execute any scuzzy transaction, and you had to go through an API called scuzzy scuzzy command, which was beautifully named. Like the API was called, all the functions were prefixed with scuzzy. And then the actual thing you do with it is a scuzzy command, right? So it's scuzzy scuzzy command, which I hated. The other annoying thing with it was if you wanted to do an asynchronous completion, you had to do block IO with it. So the only thing you could do asynchronous transactions with was disk IO pretty much. If you had a management enclosure or a scanner, which no one has anymore, plugged into your Scuzzy Bus or tape or something like that, and you wanted it to do asynchronous completions, you had to pretend it's a block device and push a buffer down with it, or you had to poll pretty much. The other thing is when you, to do an asynchronous completion, you can only register one handle for the device that you're currently working with. So again, if you have an interesting device like an enclosure which can do lots of different things, you have to poll or you have to fake buffer commands and do a completion. And you only get one completion which has to figure out what the request was to do different things, right? So this sounds really horrible and broken. Yeah, the other thing was controllers can do a certain number of commands at a time, but the devices themselves can do another number of commands at the same time. So there's this resource scheduling you have to do. OpenBSD's mid-layer was very conservative. So it would take the number of commands that the controller could do and then statically allocate a portion of them to every possible device on there. Now, every possible device, once upon a time, meant you had old Scuzzy Buses had eight or 16 devices on it, so it was very easy to do the math, and it was very easy to get the split right. And old devices were kind of limited and only did one or two commands at a time anyway. So this kind of worked. However, these days you have very wide buses, like Fiber Channel. You notionally have 512 targets, but you still only attach four devices, right? So the maths that OpenBSD did was the number of commands, let's say it's 1,000, divided by the possible targets, which was 512, which meant you gave each disk two command slots. The problem with that is you're only giving them two command slots, right? And you're not using 1,000 and something of them, just sitting there idle. So these are the things I wanted to fix. The first, it's taken me about three or four years to get through all this stuff just because of the nature of the development model in OpenBSD, but it kind of works now. It's quite good. So the first thing was replacing the Scuzzy Command API. With the Scuzzy Scuzzy Command API, you had to pass all the parameters for the Scuzzy Command on this stack and fake up a buff, right? So your actual command, the data, the time out, the number of retries you want to do, we're all arguments to a function, which was rather opaque to look at. These days, what you do is you request a Scuzzy Command, and if it's available, it will give it to you, or you can sleep waiting for the Scuzzy Command to become available, or it fails because there's no resource available and you take the appropriate action at that point. Once you have the Scuzzy Command, you set the retries, all the data and stuff, it's all parameters to a struct. So it's a lot easier to program and a lot easier to review because you can see Scuzzy Command retries equals 20, or Scuzzy Command time out equals two seconds, things like that. Then you issue it to ScuzzyXS exec, and that is it. It's gone onto the hardware. You have to return at that point and wait for the completion to happen later. The other thing I did was add a callback to the Scuzzy Command structure, which will get called for every command that goes down, not just the ones with buffer block IO, which made things a lot easier. And it's per command, so you've got a lot of flexibility about what you do there. There are some cases where you do want to wait for the command to complete. So there is a ScuzzyXS sync command, which does magic stuff behind the scenes to scheduling threads and things like that, but that's not the important bit. ScuzzyXS exec is the important one. Once I had that in place, I could then look at the resource scheduling. So now I have weight cues on commands. So adapters have 1,000 command slots, and they generally manage them themselves in their own private structure to represent their command on the hardware, right? It varies for all the different Scuzzy controllers. I said I made an API called the Scuzzy IOPool API, which lets the adapter provide callbacks to gain access to these resources. And there's code in the Scuzzy mid-layer to schedule access to those. So something like a Scuzzy disk or a CD or something like that, it goes, I want to run a Scuzzy command. But instead of sitting there waiting for it, what it does is it registers a callback that will run when the resource becomes available. If there are no resources, it will put this request on a list. And when the resources return to the pool, it will then call the callback with the newly available resource. So this means you can have, like, thousands of things waiting for one or two commands on a Scuzzy bus, and they will all get eventual access to the resource as it is used and returned to the pool. This happened to solve a bug in USB, where USB devices only have one command, but they present multiple devices, right? Because we now schedule access to that one resource, we no longer have things that are fighting for it. Now, what used to happen is they, because the mid-layer was stupid, it would let both devices try to use the bus at the same time, and they would stomp on each other's one command, and the memory computer would panic. So there's some nice outcomes from this, apart from IO scheduling and such. I have also worked on a virtual Scuzzy implementation, which allows us to do I Scuzzy target. I have a lot of complaints about I Scuzzy, which I won't go into unless someone asks me about. We've also modified the buffer layer so we can have different IO schedulers. So far, we have disk sort, which is the default and really bad. It's been tweaked. People have turned the conditions upside down, so it does IO in reverse order, and it goes faster than the current implementation. I don't understand that one. And there's a 5.0 scheduler for dealing with modern devices. Yeah, that's what we've been doing. Any questions? That's a very generous description for the... Yeah, USB is awful. Yeah, so with the IO scheduler, the IOPATH gets a Scuzzy handler, which it registers, but it only puts itself on the queue once, even if it has a backlog of work to do. So when it gets the callback saying you're allowed to do work, it then checks to see if there's more work to do and registers itself again. So it's 5.0, so if you have two disks with a lot of commands pending on them, they both register on the central list. The first disk will be able to run its callback and then it puts its callback on to process the rest of the work, then the other one. So it works like that. Yeah, any other questions? How's the government? Drivers. We're pretty good on most hardware. The big holes are recent versions of the Q-Logic fiber channel controllers and Emulex, but people don't tend to plug those into open BSD boxes, so there's not a huge number of complaints. The best fiber channel controller is the LSI one. It's a really, really good piece of silicon, but it's really unpopular in the real world for some reason. And all the SAS controllers these days are LSI as well, and we have very good driver support for those. And we have a few RAID controllers, like the Compac and the LSI, the Dell parts and things like that. The drivers are really solid there as well. So we did steal the idea of emulating SCSI on top of ATA from Solaris and Linux. So we're trying to make everything look like SCSI. The ATA driver support is getting there. With the IOShuddling stuff, it helps out a lot with the ATA stuff as well, because we recently got port multiplier support. So your port before the port multiplier has 32 commands, but each disk can take full advantage of those until another one gets busy now and now, fair share. So drivers aren't a huge problem. Any other questions? Everyone looks either bored or tired. OK. Well, that's all. You can actually hear me. People from record me and left on a jazz. Does that work better? Yes. OK. So that looks like it works. So years ago, I was mucking around for just hobby and also clients checking out how fast you could actually insert data into a magical database just for the heck of it. This is often done for data acquisition purposes. You have a limited set of data, but you have lots of it. And it just keeps on coming and coming and coming forever and ever, hallelujah. So let's say that little set is about 60 bytes long. And I've just created that table to emulate that. Of course, it might have multiple columns, but you get the idea. Doesn't matter whether you have two columns of 30 characters or one column of 60 characters, that in terms of performance makes no difference. So this is a laptop. It runs us a couple of gigahertz. I'm only going to be using one core. I'm going to be inserting into my ISAM. How many rows a second can I insert to that table? Any thoughts? There's another reason for that. It doesn't actually lose anything in this context. OK. Oh, by the way, per second. Yeah, 10,000? Yeah, anyone else? OK, that's fine. That number has nothing to do with the actual rate, by the way. Let's leave it running for a little bit. What do you reckon? Pretty good? So it's inserting about 220, 210, let's say 200,000. Let's do it conservative. 200,000 rows a second. How did I do this? How did I hack the server? This is a stock standard server. I did not hack anything. What I'm doing is actually making use of the way the system was designed. There are a couple of server settings that I tune for my particular connection. I'll show you the code in a moment. And there's also a certain way I run these queries. And that works rather well. And that's why it runs so fast. If you run this on a proper desktop box, you can run between 300,000 and 400,000 rows a second on X3. And if you have a decent file system, for that kind of optimization, I mean X3 is a decent file system, not for that kind of stuff. But if you use XFS, you can get it even faster and so on. So how many rows did they insert while I was yapping about that in over that minute? Over 15 million rows. Can we see that? Sure. Select count star from machine log. There you go, 16 million rows. The only reason this is fast is because the select star in my isam is optimized. It keeps your row count. If I were to do that in NUDB, it would be slightly in pain now and would have had to scan its primary key to figure it out. So I do know what I'm dealing with here, otherwise I can't do that demo. How do we get rid of all this stuff so it doesn't look up my disk space? OK, hang on. Truncate, table, lossy log. OK, go on. And the way truncate works in my isam just for your information is it just sets the file size back to zero. It's a really quick way. That's what truncate does in my isam. Yes, sorry? I'll just truncate some. I've seen it before with zero rows expected. Oh, absolutely. Because it doesn't care how many rows were there. It sets the file size down to zero. It didn't do a head count. Because that counting that gives you lots of interesting information takes a heck of a long time. If you want that, you would delete from table. And then it does delete, delete, delete, delete. And the rest of the file is still there. This actually sets the file to zero in my isam. So it's a completely different thing. In NDB, it used to map. Maybe it does something different now. Maybe. Yeah, so basically in NDB, you have R or it's in a separate data file to remove that. So if you actually wanted to return how many rows were expected, you'd have to walk the entire tree. And it would just walk the tree in order. It means a whole bunch of disk sets. That one from table is basically a pile of data. And other engines would just map truncate to delete. So you can belay you with the implementation and not bother with any of that. But there's an optimized path. And this is me using it. OK, so that gives you a basic idea. What am I actually doing? There's a bit of C code here. Yeah, so if you're accessing MySQL from Java, you're not going to find much use here because there's a certain amount of overhead there. However, you can still do this stuff, whatever other API you use. I just did it in C to not have any scripts in between. I've tried this from PHP. And it just has less benefits, obviously. This is a quick hack. So this will look like the usual quick hack. You can argue about the exact implementation or niceties of my C coding. I don't do a heck of a lot of C coding. And this is a couple of years old now as well. So anyway, bear with me. This just executes a query or transaction error. You get the basic idea. We grab a couple of parameters and we monitor the time elapsed, and so on. So the parameter that I'm passing in is actually the maximum length of the query. How long is the query string that I'm putting in a single statement allowed to be? Why might that be? Any thought? Stuart is not allowed to answer. To buffer the SQL sites. Yes, absolutely. Yes. Yes, what am I doing? And there was a big number. I think it said 200,000 or something. Well, I think it was 200,000th the number that I plunked in. Yeah, it was 200,000. I figured out that on this laptop that is the optimum number. So what it is, I'm using something called a multi-row insert who here uses that and knows about them. Yes, some of you do and some of you don't. It's part of the SQL standard, but most database servers actually don't support or implement or even know about it, which is a bit sad. So you can do insert into and then you have your opening bracket and then column A, column B, column C, and so on. However, and then you have your semicolon if you'd write it out, you can do comma. And is there a limit to that? No, apart from the maximum package size that you can toss at the server. And how much do you want to stick that string together and how long you want to wait for that? So I make those strings really long. So how do I insert lots of rows a second? One of the aspects is this. I don't issue an insert for every row that I want to insert. Instead I put it in the buffer. I keep adding towards the insert until it hits 200,000 characters and then I toss it at the server. So why is that? Any thoughts? What am I optimizing? Yes, round trips. And that is one of the main things that you're actually dealing with. Round trips is one of the problems. So by eliminating that, we're already reducing a lot of overhead. That kind of stuff rocks. It's also what MySQL dump actually uses by default. It actually does multi-row inserts, which is why you can't just import a MySQL dump into another database server. You have to disable the multi-row inserts so another database server can actually read this stuff. But again, it just goes standard. It just happens that MySQL is one of the few that implements it. It's really curious. I think that some others are also now doing it. So a couple of years ago they didn't. What's that? It used to not. And the same, I think, SQL server does it as well. But that's only a few years old. But anyway, now I'll happily pay that. Anyway, more do now. But it's not something that many people know about. It's definitely important. Anyway, it creates a database connection. Then I set the session variable max allowed package size. Who will tell me what that one does? No, your band. Like Android is mourning in the joints. They implemented that stuff. So of course, you know, it doesn't just come back with an error. It dropped a connection on you. So if you chuck a query string at the server that is longer than that variable, than that number, then the server will drop the connection on you. Just as a, let's say, a very simplistic denial of server protection. So if you want to send really large bits of data at the server, whether it's inserting something or selecting something, the length of the query string needs to be at least that. But possibly, that needs to be a bit bigger. So setting that at multiple megabytes usually helps. If you're inserting really big blobs, of course, you need to make it bigger. Anyway, in this case, we're sending big stuff. And I just chuck it at twice that amount for simplistic stupidity. OK, question there. About four gig or something? Well, the reason I can set this in session, actually, now you could probably rewrite. I think everybody can set that one. I think one developer didn't talk with another developer. Yes, once they actually have established a connection. Interesting point, yes. There's probably some doggie fake security going on here where this came in after that thing came in. And it probably circumvented it. What do you think, Stu? It's probably doggie one, isn't it? Doesn't care. He'll think about that. We'll get back to later. What's your opinion on anybody having to do insert being able to fiddle with parameters like this? What's that for? What's your opinion on that? What's my opinion on the ability? On how to do an insert that's permission to miss with parameters like that? I think that's very useful. In this case, it's important. It enables me to get the speed right. But if I put my privilege of doing insert. Yes? Yeah. You've got nothing to say. It's nothing to do with insert. It has something to do with the query string. If I do a select, I still send the query string. Select, blah, blah, blah. That's the string. Yeah? So I'm not talking about any parameters. It's anything either way. So bulk insert buffer size. When you're doing multi-row insert in particular, my isam internally queues that in a certain way. And it uses the bulk insert buffer for that. So what do you have to know about it? If you don't know about that, it takes longer to insert that stuff. I happen to know about that one, and I've optimized it. So I'm doing that one. What does she actually insert look like? It looks something like that at the start. And then I just keep adding stuff to it at the end. So it's not particularly interesting. It is very, very crudely done. And then I add an infinite loop where I do the time lapse and all that kind of stuff. And every once in a while, I print out how many queries a second it does. So at first, it sucks a bit of nonsense. And after that, it settles into the proper time. That is basically it. So yeah, there's some nifty trickery. Did I modify a server? No, I did not. Can I get a heck of a lot more into the database server than otherwise? I mean, you guessed, what was it? 10,000? 10,000, so you were off by a factor of lots, right? It's not just a little bit. It's by hundreds or thousands or whatever. And I think that's quite interesting when you need to use that kind of, when you need that kind of performance, it's not a matter of, is my score the right tool for the job or is Java or PHP or something suitable? It can all be done, but of course, each environment has its limitations and overhead and so on. But if you know the environment you're dealing with, you can optimize it. Any questions about that? And then I'll answer why it's called LoshiLog because there's an extra trick to it. Okay, two. You first, then you. How often does it start actually finding in the database? How often does it start actually fetching up and then throwing it to the database? How often is that happening? What's the difference on that? As soon as the string length ends up at 200,000 in this case. Now, that would depend on the machine, the speed of the server. We hear approximately. In this case, it's 200,000 because that's what I told it to do. You told us that number. So how often do the writes actually hack into the database, right? I don't understand the question. Well, we've actually got 200,000 inserts into a statement, trying out the database. Well, the string is 60 bytes long plus a couple of quotes. So you divide 200,000 by the security. You work at that. I don't know. I haven't added that up. I just did some testing with rough numbers and this ended up being the fastest. So on this particular machine, it ended up that waiting, adding that up and then sending it and then adding it up and again sending it was faster than doing it more often or less often. So the number of rows in that statement is 200,000 divided by the length of the question. So you're batching up 200,000 rows and then we're trying to get the database in one go and then we're starting all over again and batching up another 200,000 and trying to get. Not 200,000 rows, 200,000 characters of query string. Yeah, so how long does that take? I don't know. You'd have to do some division on that and work it out. I really haven't bothered with that. But the thing is it's not a matter of queries or seconds, the issue is, yeah, to me, that really doesn't matter. All you need to do on a system is essentially play around with that number until you find the optimum number for that particular infrastructure, because it will depend if I had run it on a separate machine over the network, the number would be different. If the machine is faster in IO and parsing, it would be different. That's the other thing. The MySQL parser is optimized here. Whenever you're parsing stuff, that's actually a large component of the amount of time that gets used on the server processing the row. You'd think, oh, it's file IO. No, it's not. Interesting sideline on this. Who is here has heard of the archive storage engine? Yes, you have. My good friend, Brian Aker. So Brian Aker implemented this a number of years ago in response to a request from a Yahoo. They didn't end up using it, but he implemented it because of that. So it can do writes. It can do updates and deletes. It can only do writes. It writes in a compressed fashion, essentially, Zlib-style stuff. And when you do a read, it will always do a table scan. So it doesn't index. But essentially, if you're doing logging and then you're doing a data analysis, you will scan the entire table anyway, group it, and analyze it, so you will be scanning anyway. So it's very useful for that. Now, the reason for implementing it that way is to reduce the disk IO, where we're compensating for lack of disk IO capability with burning up some CPU power. As it turns out, it's a bogus requirement. It wasn't the IO that gets maxed out and disproves it. When doing this stuff, I max out the CPU, not the IO. If I start another thread with another connection, I can actually even insert into the same table and still get higher performance. And that's funny, because my ISAM doesn't have concurrent inserts in that way from two different connections. They will wait for each other. So even with that in mind, I'm creating concurrency conflicts inside MySQL. I can still get higher than that number. So I'm maxing out the CPU, not the disk IO. So the whole premise of the archive engine is completely bogus. It was just overhead elsewhere in the insert infrastructure and inside in the overhead. The whole archive engine doesn't need to exist for that particular reason, which I thought was quite funny. Of course, Brian and I had a good laugh about that one. But the thing is, you don't need to use the archive engine if IO is your bottleneck. There's something else that you need to do, namely, optimize the IO path. You had the question. So if you're insane, you don't. For multi-row insert, you get back the first one. And you don't know what the others are, because depending on the settings, they could be skipping in a multi-server system in the MySQL cluster environment. They could be all over the place, and they get blocked, and they get written by different nodes, and lots of reasons why you don't know. You don't know. Is there any time that's individual? Individual, OK. The thing is, for this kind of insert, it wouldn't matter, because that kind of data has its own. Yeah, it runs on the timeline. You know where it goes, and yeah, it doesn't. If it does matter, of course, yes. Then you would like to have something like that. I think it might make sense. In the context of doing stuff like this, if more people were to actually use this, then there would be a valid case for actually implementing it, and it shouldn't be particularly complicated. Yes? I was just going to say, when you've got index and all that, you still get these sort of massive human sort of content? Oh, it would still be faster. But yeah, I mean, you have to take into account that more IO gets done. Yeah, definitely. So in this case, there's no auto-increment. There's no primary key. There's no indexes at all. I'm doing pure my isom. If you do this in an auto-storage engine, you will see the performance difference. With InnoDB, I can only get about 800, some of between 80 and 120,000. So that's about half. But there's a lot more stuff that InnoDB needs to do, so fair enough. And PBXT, I haven't tested it on PBXT. This version of MySQL doesn't do that. It's an old install. I don't know. What is it? Select. It's like an alpha version. Yeah, see, that's an alpha version or better version of MySQL 5.1. OK, last question, then I'll explain. Lossy logger. So I've done that, and it still increases a little bit. So I'm maxing out the CPU on the server. That thread, it maxes out the CPU core. So I can still increase the performance by running multiple threads. Yes? You end up at around 340 or something if I do it here. 1,000. Yeah. Yeah. I mean, we could do it now. It takes a bit too long, but I can run it after I disconnect and I'll show you on the laptop. Why not? So that's the thing I was looking at, and that's why Lossylogger came in. So with logging, for instance, from web servers, they have a lot of information about who clicks what. Do you actually need all that information? Or do you need a reasonable amount of the data so you can actually get a statistically accurate enough bit of information? There's a caveat there and I'll get back to it. So that's why I thought of Lossylog. So what I was thinking of was building an aggregator and you toss the logs at it through UDP. Then you don't have to fuss about the connection and you don't need to manage all that and queue it. You just toss things at it from lots of different connections and you don't care. Will you possibly lose some? Sure. What's that? Sorry? Yeah? Similar ideas. So and then you would collect that large string and then toss it at the server. And meanwhile, you collect at the back end. You would actually multithread it because in this case, this particular C thing is single-threaded. While it talks to the server and waits for a response, it's not assembling a long query string, but it's assembling it extremely fast locally. If it were to get the data, you would multithread it and just switch between the two buffers. Yes? You can use the regular client library in some asianc fashions. You can also use the Drizzle library, which can talk to MySQL server and then you can do things like that. You can send a query, not necessarily wait for an answer. There's absolutely no reason why it can't. You can't run multiple queries on the same MySQL connection, but that's just something you can deal with it in a different way. So that's what Lossylog was intended for, which is why it's called Lossylog. This was just the first experimental part. I never found any buyers in particular that wanted that thing in the end, but people interested in it. It's been interesting to talk about it because it's kind of funky stuff, but there's no magic juice involved necessarily. It's just useful, and it's useful to know about these things and kind of validate certain issues. The caveat with using the UDP is you don't know how many packets you're actually losing. And some people dealing with the statistics get upset about that because losing 5% of your package is okay, but they want to be sure that it's 5%. So yeah, I reckon with many websites, it's a case of tough luck because at the moment they don't have a choice. In some cases they get too much data to log in and track it at all. So there's a choice between not doing it and doing it with a possible margin of error that may be out of bounds. Well, tough luck, but at least you get more of a ballpark idea. It'll allow you to scale and stay alive for next week. So I figured it was a fair trade-off. And possibly some people are using this because I've done this type of talk over the last couple of years. This is not actually new. I did it at OSDC in 2007 as a bit of a quiz. Someone won a free training day by guessing pretty closely to what I was doing on my laptop. It's still the same app. I haven't even recompiled it. Last question. Was a comment still floating around? Anyone? Okay. Done. Thank you.