 Yeah, no, I'm not. Turns out I'm not. If you do that again. Cool. Welcome to the Data Storage Mini Conference. This is a very relaxed kind of fun day about data storage and a whole bunch of different topics on it. Basically, the idea was to have as much different ways to store and eat and understand and process data as possible. So feel free to ask questions. We have microphones for recording. And there's probably a first for LCA. It looks possible that some kind of video may be recorded, and if we're really lucky, sound as well. So a question we asked. Our first speaker is Arjen, talking about understanding joins, of which there will be whiteboards and markers. And I believe we have the one last whiteboard marker left in Queensland. I just put it in. Yes. Yeah. So that's very valuable. I need three colors. I might have to make it up. Yeah, so it's three colors as long as you want black. So, Arjen. Howdy. What's that? I don't want to lose that. Your mobile phone rings. I'll answer it, which may be problematic if it's your boss. You want to have mine? It's to you. My boss is not going to call me. I could save heckling for later. You could save heckling for later. This is however an interactive session. Am I allowed to run around? You're going to be very active there. Yeah, I can't stand still, sorry. Yes, yes, green. Anybody else? Got two colors now. Surely we can crowdsource this whole thing. Red or blue I'll go for. Yeah? Anyone? Anyone? Want to tweet to Nick's door? Sarah, do you have whiteboard markers? Other colors? Ah, here. Ace, thank you. Ah, no, no, no, you're getting it. OK, you get to keep that one. I need multiple. Thank you. We win, yes. So I asked you yesterday, can I get a whiteboard? Surely we have one around here. Somewhere we'll check. So you walked into the room, checked, got the whiteboard. OK, so that's all good. OK, who here reckons they understand joins? OK, there's one maybe to. Stuart probably gets the credit. Yeah, yeah, yeah, OK, I've tested him before. There is hope, yes. So this is a bit from open career training, where years ago I was running the MySQL training material, which was boring as it was written by someone with a double degree in mathematics. And of course, everything is really easy. But if you're actually standing in front of a classroom with people who don't understand joins, that wasn't actually the way to go. I couldn't work that. And at some point, I saw someone else do another presentation, and I suddenly figured out how to do it. So that's why we're doing the visual thing and interactively, because it's much more fun. It keeps you awake. I once did this for someone with a visual brain, visual spatial brain, over the phone in four minutes. So it can be done even quicker. I could do it in a lightning talk sometime. It could be fun. The interactivity gets lost at that point. So let's see if this actually all works, I think, at that point. Yes, it actually does change. So who here uses subqueries? Let's do check that. Yeah, so I presume you're all, by the way, using MySQL, you don't have to. So who here doesn't use MySQL but something else? That's cool. What do you use? Postgres? And there's a drizzle. Sure, Mr. Heckler. OK, that's cool. Yeah, so I mean, the basics apply. The thing is, before MariaDB 5.2 and any stock MySQL version joins ROK and subqueries suck because they haven't been optimized properly, the original people who implemented escaped from Sun Oracle, went to Monte Program, and did the work they weren't allowed to do by the sales people earlier because they had too many other funky things to do that sold stuff. They fixed up what they were supposed to fix up like five years ago, and now joins, or subqueries now actually work fast. It's really good. It's a bit of a non-event, but if you do use the media apps, please do use MariaDB. There's a talk about it this afternoon. Now, the reason most people are really working with joins and not subqueries in MySQL, and are not necessarily familiar with how things work, is that, well, MySQL used to not have subqueries until version 4.1, which is, of course, ancient history now. But it was quite an important issue at the time. And then before, certain types of joins, outer joins, didn't exist before SQL 92. So before then, you had to use subqueries rather than something else. So lots of textbooks, including the people who learned from the textbook, and then started teaching, will have been teaching people subqueries rather than joins. And there's a big problem, because with MySQL optimizing for joins rather than subqueries, you need those joins. So let's get those right. So first of all, let's play a game, because that's what we're here for, aren't we? Let's get this right. Stop. I'm not that tall. What key words related to joins exist? And this is interactive. You're all playing along. Using? Using. Let's try using. Join. Absolutely. Possibly yes. Yeah, there's no trick question there. That's absolutely perfect. Join is a join keyword. Yep. Inna. OK. Outer. Outer. Hm? All? No, all. Sorry? No, all. That would be union. All. No union. On. Right. On. On, yes. Or a puzzle symbol that goes out of the comma. Yes, please. I always forget that. Thank you. Thank you, Stu. Yes, there is a comma. The humble comma. We'll work out what an evil thing that is later. Yes? Left. Hm? Left. Left. Yes, I'll go with left. Yes. Both. Both. Yeah, that's fine. Oh, mustn't forget natural. Yes. Now, in my core context, it's called straight under could join and sexually modifier. So let's play with that. Natural, all right? I have a nice statement about natural later on. You won't forget that one. Yes? What else? Where? Where? OK, let's leave that one. Yeah. OK. Not technically incorrect, but let's leave that one for now. It's not in the context here. OK, any others? I think there's some missing. Not really. How about right? Have we done outer? Have we done outer? Left, right, full. We had using, we had on. I've got a complete list somewhere, but they're so small. Inner comma. Oh, hang on. More. Something called a cross join. Why not? Oh, yeah. That's your straight join using on. I did full. Yeah, I think we're good. I think we're good. OK. Now, with all this fun in mind, how many types of joint? Let's put this on the whiteboard. Look at that. Isn't that perfect? How many types of joints actually exist? Yeah, I see you vote there. Who wants to fess up? You're not wrong. You're even right, Mr. Heckler. He's my resident Heckler. You're in the wrong row. You should be in front. No, I want to have one back. OK, that's good. Yeah, excellent. He visited everyone in my talks, and I was at OSDC doing a couple of talks. He traveled all the way from Brisbane to Melbourne to Heckel. Thank you. Thank you, guys. You're welcome. I'll stand by that. Yes. OK, so how many types of joint? Probably three or four. Probably three or four. OK, I don't know. I don't know. Who thinks it's two? There's a couple of votes for two. OK, who reckons it's three? OK, how many votes? OK. What was it, two or three people? Let's say three people. OK, types and number. OK, who thinks it's three? Two, four, five. OK, that's five people. Yeah, who thinks it might be four different types? One. And what are all the rest of you doing? One type of joint. No, that's not going to. OK. Overflow. No, the right answer is what gas secretly, then it's two. And what are those two gas? Now you have to figure out. Inner and outer. Yes, absolutely. It is inner and outer. So now let's play this again. Get this right. There. OK, so let's try inner. Stick for time. And outer. OK, so with inner and outer, since we already have those two, we can now wipe those off the board. Obviously they belong around the group. So you get the idea. Some keywords belong on one side, and some can be used with both. So now we're going to pick off this list and put it there. So then you have a bit of a grip. So which keywords might belong with inner, which might be with outer, and which might be comma on inner? Do people agree with that? Any other ideas? Dubious? Who also thinks that, really? And is pretty sure about it? You don't have much support. He is right there. He is right. I'll just mention that. You never use commerce. Yes, you're a good lad. Exactly, but you know what it means. If you find it in the wild, they do loiter in the wild. That's when you need those nasty ware clauses that the man in the back mentioned. He's right too, but we didn't want those. OK, so what else? Come on. I know it's early. Just had coffee, but come on. I think that would be a both. Where is it going? That would be both. On it both? OK, yep. So it's using? Very good. Now we're getting there. Hang on. Give me time. Yep, I think we're good. Yeah? Left and right are out. Left and right? Ah, what do people think? Who said that? No, he didn't. You said? Who said left and right is outer? I said left and right. OK, you said left is inner. He wins. You've been in training with me, haven't you? You have. Forgetful people. No, left and right is outer. I have a little disagreement between manuals and standards and stuff, but I think we can regard it as both. Yes. There's no particular hindrance to it. Exactly. Well, we'll talk about them. In the end, it doesn't matter. Let's put it that way. For us, it shouldn't matter. Now we still have full, cross, and straight. Yes, technically, absolutely. That's a good one. Very nice. The academic wins. OK. Well, but yes, at least you know your theory. You don't have to now apply them. OK. Which of those keywords are actually not really necessary? Isn't that an interesting one? Most of those keyboards are complete crap. They're really, really not interesting at all. They're just there to confuse us. You know, our SQL was designed. It was designed by committee. And no good comes of such things. A natural join would be a perfect example of things that seem like a brilliant idea that are really, really nasty. So let's, but interestingly so. OK, let's go on to the next bit of fun. Who remembers these from high school or somewhere else? OK, for whom wasn't this high school but something else? Where did you see them first? Uni, OK, not high school. Where did you go to high school? OK, fair enough. No, seriously, different states and countries have different things. So I'd like to learn about that. OK, yeah, primary school? Queensland, or New South Wales? New Zealand. OK. I'm just going to cry when I tell them. What they do now? But OK, so what's the name of it? Venn diagrams. That's what they are. Now here's the really, really cool thing. You're looking at the way that relational databases work. This is not an analogy. This is how they work. They're both based on set theory. It's the same thing, which enables us to actually use them to explain joins. Because yeah, it's the same thing. That's just wonderful visualization. It's a beautiful thing. OK, now let's have a table. I need my pens again. Let's start here. Let's create a couple of tables like that. So we have a couple of sets. And we do a join between, I have them up there. I don't actually need to draw them there. What would happen if I do an inner join between table A and table B? Based on, let's say that's the ID column in. We're simplifying here. Yes, everybody agreeing? It's numbers two and three. Isn't that wonderful? So I'll draw those circles here so we can play with them and actually give them color. Two, three, one, four, and six, and five. And I'll draw a bit of a thing here. That's table A. That's table B. And we now get two, two. Hang on. Let me get this one. Better. Sometimes forget exactly how this works for me. I need to do one, two, three, four, and then two, three, five, six. Okay? Everybody agreeing that this is approximately right? Okay. So when you do an inner join, you use the red pen. And you get that. And now you might have an idea of why an inner join is called an inner join. That's actually visually correct. That's actually really where it came from, but as far as I'm concerned, it's good enough. I haven't yet seen it proven wrong, so that'll do. So now you can visualize it. You can remember it, and you won't get it wrong again. We'll deal with the commas and the crosses and all the other stuff in a moment when we have the rest of the visualization correct. I've got ten minutes left. We're doing good. Okay. What would happen when I do a left join? Yes, Andrea, you're banned for a second. Yes? One, two, three, four. One, two, three, four would be correct. I don't know. Left is green today. Okay. So you would get kind of that. Let's do that. Yep. That's green, and that would be left. There we go. And see, this is why I need the colors. The patterns don't quite cut it. In gray scale, it doesn't quite work that nicely. Okay. That's what you get. When would that be? When would? No, that's not an inner join. We're now talking about outer join. It's a left join. The inner join was that red block, absolutely. With an outer join, you get everything on the left-hand side that satisfies, plus the things on the right-hand side that match up. The bits that don't match up end up being gaps. How do gaps show up in the results? If you do a select star where you get the junk, the gap is nulls. What was the symbol? Was it that? Yeah. That's not a zero. That's a null symbol. Okay. That's always nice. When you build a query, don't try to build a complex query in one hit. It doesn't work. Build it step by step and do it by using select star. Do it in a tool, not in your app, because if you put select star in your app, Arjen will get cranky at you. Your app will break. Well, you will get cranky at yourself later on. Your previous self will be hurtful, will be in pain over the later self. You know, you get the idea. You always work in the team of these two people, and it's you and your future self, right? So don't hurt your future self or your past self. Okay, so that's what ends up with the left join. Want to guess what a right join might do? Yes. What does the result set look like in blue today? Zero. That sounds exactly right. Okay. Did I stuff it up? Oh, bugger. Okay. Visual design. That's the problem with white boards. They don't scroll and expand. Okay. Four. Oh, there's one color too. That one. User fail. Okay. That's a four. That should be green. Ah, for goodness sake. At some point I'll work it out. Okay. Fast testing. Okay. Right. Now you see why a left join is a left join and a right join is a right join. So what's the kind of query that you would do? Why would you do a left or a right join? What kind of things can you find with that? And that's a nice, funny trick question. The answer, the single sentence answer is kind of funny sounding. Yes. You're trying to find things that don't exist. Yeah. You can't actually do a join on something negative. Yeah? You can't join a, you can't say a join b or a comma b where a not equals, a dot id, not equals b dot id. That doesn't work. It really, really doesn't work. It joins everything that doesn't match up that exactly thing. It doesn't do which one. So the only thing you can do is joining on something positive and then cleaning up the stuff you don't need. So if you're looking, for instance, for which customers didn't buy this particular product, is you pick out which customers did buy that product, but do it with the left join and then pick all the nulls. Does that make sense? So the ones that aren't null on the right-hand side have that product in their boughtish product table, whatever it is, in the sales table. But when you pick out the nulls, you get a list of customers that don't have that. Okay? So with outer joins, you can find stuff that doesn't match up. And that's the only reason why you should be using outer joins. Many apps use left and right joins kind of because it seems like a good idea at the time, but people can't explain to me why. Try to explain, verbalize to, you know, the teddy bears, the colleague next door or whatever, why you need to use an outer join? Why you need to use a left or a right join? If you can't explain it, the answer is you don't. Yeah? You shouldn't be using an outer join if it's not necessary. So that should be an inner join. Often they work out, often they work out to be the same. However, that depends entirely on your data set. You may find that your app suddenly breaks later when your data set increases. You know how web apps always work really well with one user testing? Yeah. It causes trouble in the real world. And this is the kind of same thing, Andre. Extra attributes on the side. Again, that may or may not be there. That may or may not be there. That's, again, a left join. Yeah. So that's a person, and it has a phone number, and it has phone number one, phone number two, and those point back. They each have the ID of that person, and they point back at that. You would do that with a left join. And you get multiple rows as a result. Same thing, though. I mean, this could be an attribute table. Same thing. Yeah, attribute one, attribute two, whatever. Yeah. And it points back. It makes no sense pointing it forward, because then you get a spreadsheet there. Putting all the fields here. Yes. If you have an application that says phone one, phone two, phone three, it's a spreadsheet. It doesn't work. When you see a company trying to fill in forms, you often can tell when they're dealing with spreadsheet forms. It's often rather painful. But let's continue. We still have a couple left. What's a full join? You reckon? And by the way, MySQL does not support it directly, but Wikipedia and many other places will tell you how to do it. Yes? Anyone? Anyone? It's a left join combined with a right join. Yeah. But all of the above kind of, yeah? So it's this whole thing, but no duplicates. So if you just do a left join, union all, right join, it wouldn't quite work out correctly. So you do a left join, and you do need to use a union all, but you need to filter one of the sides out with not nulls, because there could be legitimate duplicates. If you use union, the duplicates get removed, so you can't do that. So it has to be union all, so you get all the rows. But then on one side, you use where the ID column is not null, so you actually grab all the good stuff without the null stuff. Anyway, look it up in Wikipedia, and you see what it's done exactly. So MySQL does not directly implement it. One of the reasons is it could do that, but it's very nasty to optimize. So we just have never bothered, and you can work around it by just doing the union. So it works perfectly well. There's no particular reason why you shouldn't be doing that. And union all, or the full joins, they're fairly rare that you actually need them. They have some benefits, but not often necessary. So that's a full outer join. So again, it's trying to find stuff that's not there, but both on the left-hand side as well as on the right-hand side. So you need to be really sure you need something like that before you start using it. What does the comma do? What's the difference between a comma and just a join b? What's the difference between a join b or a comma b? Anyone? Anyone? See, if you do a join b, usually, no, please, no, you would use either on and then a dot id equals b dot id or using id. And then it matches up between the two tables. So that information, I call that a join condition. It is sitting right next to the place where you join. If you use a comma, you can't use these clauses. You have to stick it where, Indy? Wear clause. Wear clause, which is kind of much later in the query together with other filtering options. That's not very cool, because if you accidentally remove that thing, it will still join. What will it do? It does a cross-join, which is a Cartesian product. You've already answered the next question. Excellent. Thank you very much. No, that's good. So yes, if you're using the commas, try to get away from that and try to make it explicit. And that's why Gary said earlier he never uses comma, because it's just clearer. You can actually see what's going on. With a left or a right join, you can't stuff that up. Now, unfortunately, with all these explicit ones, you can still use the where instead of using on and using and so on. So it's still possible, but that's just a really, really bad habit. With the comma, it is always separate, and that's just annoying. So do get away from that, please. So cross-join is a Cartesian product. Yes? There's no efficiency benefit, because my score sorts it out internally anyway. The point is just in terms of maintenance. If you're doing a join between multiple tables, you have a wear clause that actually filters things. Then there's wear conditions that select your join conditions, and when you're joining, I don't know, five, six tables or ten tables, it just becomes impossible to maintain. In this case, the join condition is right next to the table that you're joining. A join condition can actually contain multiple components. It is an expression. It may contain more than just a dot id equals b dot id. It could have an and clause something else, and in some cases, that's necessary. If I were to ask the question, which client bought product a but not product b, and you'd have to solve that with joins, not subqueries, you would have an and clause somewhere in the join conditions to make that actually work. Actually, you wouldn't be able to do that with a wear clause. So it does make a slight difference, but only in very specific cases. It just don't have time to cover that right now. So what does the cross-join do? It does a Cartesian product. What kind of result would you get out of this if you did a comma b without any wear clause or join condition? What's the result? Okay, you get the idea. Yes. Everything on the left, everything on the right and all possible combinations. It's usually not what you want unless you're trying to set up a tennis competition and everybody has to play everybody. Yeah? So there are valid reasons why you should do, why you could do that. Usually, you don't mean to create a Cartesian product. Now, my idea for this is, first of all, don't use the comma so you don't accidentally create them. Because visually in your career, you'll be able to identify them. And if you actually mean to have one, use the word cross. Because then you've identified to yourself that it's actually supposed to be a Cartesian product. Technically, inside my scroll, the implementation between an inner-join or a regular-join and a cross-join is identical. Cross is one of those fluff words. It doesn't mean anything. Mathematically, there's a slight difference, but it really doesn't matter once you start implementing it. You could debate on whether it's identical, but for implementation purposes and relational databases, it's the same thing. So when you're looking at syntax, let's get that one right. That'll be the last thing we can do, I think. So you have inner-join. You would have a cross-join. You would have a lift. Now, hang on. Yeah, and same for right. Square brackets are because it's entirely optional, and those are fluff words. So if you just specify join, it's automatically an inner-join. Yeah? If you say left or right, then it's always an outer-join. There's no such thing as an inner-join. It doesn't exist. So the word outer is fluff again. So some people have an internal standard to always write outer because they write inner, some don't. I tend to just write left, right, and join and not fuss with that. But the Cartesian product, please do add the cross-in because then you know you didn't actually forget the join condition. You meant it to be a Cartesian product. Does that make sense for everybody? Enforcing it? And removing the comma? That's the first thing you could do, enforce these two. Yeah? So if you use the explicit join conditions, the explicit join syntax, to make sure it has to have an on or using, not in the where clause. If you start enforcing that one, the other ones will probably follow, but that's one that people are probably mostly using anyway, if that makes sense. Yeah? Because once you do a join, a left join, you don't want to stick it in the where clause. OK, so we have on where you actually have a whole expression. We have using where you specify columns where the names are common between the two tables. You can also specify id, comma, and other columns. So it can match on multiple columns. That's entirely doable. So we've dealt with on and using. Straight join in MySQL specifies to the optimizer that you don't want the tables reordered. And that is purely an issue of what you know is optimal in a particular case. Do not stick that in your production system because your data set will change, and that means that the logic that the optimizer will be able to use is then restricted. You might at some point know better than the optimizer, but that's only today. Next month, it might not work. Once you start playing smarter than the optimizer, you will be hurt later. So use it for testing, and usually if something goes wrong there, there's either a missing index or superfluous index. In some cases, there's other things wrong maybe with the query, and in very rare cases these days, there could be something wrong with the optimizer. It might be making the wrong choice. So it's good for testing, and then you report it as a bug or ask someone else to actually help you with that if you can't figure it out, but it's not something you would use in production. Don't try to override the optimizer in production like forcing indexes and so on. That's a really, really bad idea. It hurts people. So last one is natural. What does natural join to? Not Andre, because you know. It causes pain. Yes, absolutely. You get bonus prizes, I don't know what, but something for that. What does it actually do as opposed to what you're on and using do? Yeah, okay. So it matches up. It looks at all the column names in table A and table B and reckons well if there's an ID column, let's match up on that one. Oh, but wait, there's a name column as well. Let's match up on that one. Does that make sense? So let's say you have a country table and countries have a name. And then you have a, I don't know, a capital table and they have a name too or city table, you know. It will match up on all matching column names. That's probably not a really, really good idea. So it's kind of nice magic until you hit the real world. I think this was probably a good idea in the committee when people were discussing this when there were these magical people in the back room designing databases and us, the big wide world, didn't have any access to this kind of stuff so that we could screw things up. This sounds absolutely perfect, yes. If we were all, yeah, so Cod designed this with mathematics in mind. The thing is that the relational ideas are relational databases. The specifications are already a subset of what the set theory model can do. And then you implement it and then you stick SQL on top and it becomes even more borked. It is really a pure substitute for what Cod originally envisaged. It can't be fully implemented in the way that SQL works. It just looks really nasty. But yeah, it seems like a nice idea. It also can work okay if you had a really, really strict elaborate rule on what your column names should be. But we all have ID columns and name columns and that kind of stuff. And that means that it doesn't work in the real world. Now imagine that you actually make it work. Next week someone will add a column. Yes, this is exactly what kind of stuff that happens and it will suddenly match up with something else and suddenly your query doesn't work anymore. But I've only added a column. It doesn't even have special data in it. It doesn't work anymore. So my suggestion is if you ever have an employee or someone near you working produce that, produce a natural join, do two things. Fire them because they're dangerous people. Do not trust them in your neighborhood. Second is please let me know. Okay? This hasn't happened yet. It has been identified in the wild but no culprit has been found. Just old code. People walked into a project and that was there but that's as far as we've got so far. No hedge have rolled yet and I've done this training for, well, almost 10 years now. So that's getting there. So luckily it doesn't, well maybe no one wants to fess up. I don't know. I'm asking other people to dop other people in. So maybe there's more hope. So yeah. I mean who thinks natural joins are still a good idea. Thank you guys. That was the heckler, yes. Yeah, pretty much. Yeah. Anyway, then here's some exercise. We've essentially done those and there's some advanced stuff if you want to play with it later but it's not really necessary. Last questions. Anyone? Do you have this in written form somewhere? Can you repeat the question? Yes. Do I have all this in written form somewhere? No. I do not. I'm afraid. I'm happy to not wipe this out and you can copy it down. Take a photo. No. Because it's something you would, yes, grab it from him. It's something, I would seriously recommend that you write it because the exercise of writing it down will help your brain remember. I'm going to use my whiteboard markers for the color on a piece of paper. No, you will write it, not me. You've got it already. Yeah, so why the question then? Yes, no. Because it's the exercise plus the writing down that makes you remember it. You showing it to your colleagues is not going to do the trick. So, yeah, it serves no purpose.