 Well, it's that time of the week again. It's time for Chitchat Across the Pond and this is episode number 783 for December 30th, 2023. And I'm your host, Allison Sheridan. This week, our guest is back with Bart Bouchotte's with Programming by Stealth 158B because I asked too many questions on the first half. No such thing as too many questions. Well, it did work out for us and I definitely needed another week to get started on the second half. But before we dig in, I want to alert the audience to a significant enhancement to the material we're creating. I use a service called Afonic which does a lot of things with the audio file when we're done recording, including leveling the audio, adding metadata to it, converting it to an MP3, FTPing it to my servers for you to be able to download. But they recently added AI generated transcripts. And we've had this for a while with Programming by Stealth. If you look on the podfeed.com version of the show notes that of course point back over to Bart's show notes I know that's kind of confusing but that's the way we do it. Anyway, if you look at my version, you'll see a link to an unedited version that created by AI of the transcripts. And I'm going to keep emphasizing unedited because I'm not going to edit it. It's what you see is what you get from this. Anyway, that's been around for a while and you've probably noticed that already. But recently George, the guy that created Afonic has added auto-generated chapters. And it's a two-fold enhancement to the show. First of all, when you're looking at the transcript, you'll see the chapters that it's created at the top and you can click on them to jump to the part of the transcript where we cover a specific topic. Now, if you think about Bart's fabulous tutorial show notes from last time, 158A, he's got the section on telling us about the challenge solutions. And it's very, very short. But what we said about that was very, very long. We talked a very long time. So if you go into the transcripts and jump to that part, you'll get everything Bart explained to me and all of my silly questions that where he was having to explain things to me. So there's a lot more content in text that you can go back and reread if you need to to try to remember how Bart taught me what we learned last week. The other thing is those same chapter marks are automatically added to the podcast file. So when you look in your pod catcher of choice and you want to say, I just want to listen to Bart explain that piece again, you can jump around and it's not perfect. It's not maybe where we would put chapter marks, but it's auto-generated and we didn't have to do any work. And again, it is not going to be edited. We're not going to fix it. But it really, I think ads makes it a lot easier. For example, this week, I wanted to go back and rework the challenges again. And I made it through two of them. And the way I did it was I went directly to the transcript and I found those parts where Bart explained it and what he did last time. So that really, really helped me. I'm excited about it and it didn't cost any extra and you get that for free yay. Yeah. And can I just say, this is a great example of the power of AI because you have two uses there of the modern machine learning technology. First off, they have trained the machine to understand English and turn it into text. So that's a nice bit of machine learning. But then they've taken one of the modern GPT models and they've used its ability to summarize to create those chapter markers because they're really good one-sentence summaries of the text that comes after them. Like impressively good one-sentence summaries that they use as the headers. Like that's cool. Yeah, they almost make it, it's not just the words we said at the beginning or anything like that, it does sound better than that. By the way, there's also a long summary and a short summary. Those are a little weird like of the entire episode. The short summary is not bad. The long summary gets kind of weird but you have to get past that. But if you use the chapter marks you jump right past it. So anyway, cool stuff. I thought that was really nifty. That is very cool. And yeah, it's just as a computer science system I was fascinated to see these new technologies do cool new things. And this is AI doing cool stuff, which is nice. Cause I was promised as an undergraduate that AI was 30 years away and always would be. And I think I was an undergraduate 30 years ago. So I think they were right. Anyway, so this is a slightly difficult one. Because we're halfway through PBS 158. This is part B. So if you're listening back to back I'm not gonna give you a summary about you heard two minutes ago but I should probably give us a summary anyway because otherwise if you're not listening back to back you're gonna say, what is he talking about? So. Especially since I'm not listening back to back. Yeah, we've had, for the people listening we've had Christmas between the last two times you recorded, I don't know about you Alison but there was a substantial amount of really quite nice red wine involved for me. Some really good liqueur actually. Amaretto is my favorite liqueur. There might have been a few gin and tonics under that bridge. Ah, okay. I don't do the G and T thing but a coffee with a little bit of actual Amaretto. Anyway, yes. So like I said, we've been distracted a bit. So we are now finishing up our part where we look at JQ's ability to query data and we did a lot of it in its own 157 where we learned about where we met functions as a concept and then where we met the select function for filtering down our data with a nice clear boolean, you know, yes, no, kind of a double equals less than or equal to very, very simplistic criteria. And in this installment we are expanding that out a bit to more generic criteria and to do that we first had to learn how JQ does typing which is like Jason, which is nice of it. Then we learned how to convert stuff between types and then we learned how to filter stuff by type because there's basically functions to give us only the booleans and so forth. And then we learned about the wonderful alternate operator which allows you to say, if this thing doesn't exist go get that instead because as we have discovered ourselves a few times by accident frankly, a lot of Jason data is quite I would call it dirty data although I get frowned upon for using that phrase in work. Apparently people are very precious about their data. They don't like it when I call it dirty. But it's my data, it's so dirty. How about messy? Oh, it's not that they are misinterpreting the word they're saying it's my data is not dirty. Yeah, they don't like it anyway. Inconsistent. Inconsistent, exactly. And so it's actually a very useful operator because basically when there are two possibilities you can just put one double slash the other. Now the fact that they use the double slash symbol which looks like a comment to my brain. Let's just leave that aside. And then why am I not? Oh, my show does haven't pulled so I don't see the chapter marker Alison very kindly put to say this is where we chose to split the show but I think that's the point we chose to split the show. Yeah, I think so. So we're kind of up at ourselves. I was hoping you were gonna tell me where we were. Yeah, we probably should have looked this up ahead of time. Yeah, I'm more advanced searching. No, in fact, I am certain we drew the line that more advanced searching. Oh, there we go. Yes. Because that was a really logical place to stop because it is. So we have used the select function to apply our very specific boolean yes no criteria for filtering our inputs, right? So the select function as a reminder it processes whatever was piped into it and it applies some logic and if the logic results in true the entire thing piped into it comes through utterly unchanged and if it evaluates to false the entire thing vanishes. So by entire thing if you're whatever you've selected each time it finds one that selects that is true it squirts through not it stops the whole thing if one is false. Right, but it's the way it's working is in parallel, right? So you need to mentally imagine that it's doing the same thing over and over and over again. So for each one time it does its magic it is a yes no. It's a gate either you pass or you shall not. You said the whole thing but it's each one it's whole thing not all. If there's 10 coming in in parallel then if each one gets its own evaluated if it's true it squirts through if it's false it gets thrown away. Exactly and the effect is that it behaves like a filter or a screen I guess where the big rocks get screened out and the little rocks go through in this case the true rocks get through and everything else gets filtered out. And so you will end up with at most the same amount of outputs as inputs but usually fewer outputs than inputs because you're trying to reduce your data down to the bit you're interested in, right? And so select does that based on Boolean logic which isn't a bad way to do things but there are other the select function may want to do more than just double equals or less than or equal to or greater than or equal to which are the things we learned about last time. What if it wants to apply more advanced concepts that we would want to say in English? And so JQ provides us a whole bunch of functions that we can use inside our select to do more powerful things than just is the surname equal to guess. And so that's really where I want to go today and the culmination of the more advanced is obviously regular expressions because what is more powerful for searching texts than regular expressions? But there is some fun stuff between where we are right now the second and regular expressions which is where we're going to finish the show today. And the first one is very common is the concept of does my input contain something, right? Containment is actually something you very often care about. So if we were dealing with, I mean data validation is an amazing thing, right? So there might be a set of valid values for something and you're getting some JSON in and you're basically asking is this one value one of the set of allowed values I have over here? So that's a containment question. So does it contain one of these things that I'm looking for? Or it's inverse is what I was handed is the set of allowed things a super set of what I was handed or sometimes you want it one way, sometimes you want it the other way. There are the two types of containment which is why there's two functions. They're called contains and inside and they make my brain hurt. So I figure lots of examples are the way to go but one is literally the opposite of the other. So the question is, is the thing I'm looking for inside what I have or is what I have, is it a sub set or a super set? Oh, right, examples. Let's get to examples before I tie myself into knots before I've even started. So we're going to start with the more commonly used of the two which is contains and when you're talking in English that's you say that a lot and the contains function is very powerful and it works by taking whatever the current thing being processed is what it is going to take as its input and the argument is something that it needs to match against basically. So the argument is what we're looking to match against. It's very important that the types match. So if you're handing the contains function a string, the argument has to be a string as well. If you're handing it a boolean. Because you can't really do any kind of operations if they're two different kinds of things. Right, what are the different types? Yeah, it doesn't make sense any other way exactly and it will throw an error if it's not happy. Is the number seven in a string? No. Yeah, precisely. So you can't even ask it. Well, you can't even put the seven as, no, no, if you make the seven a string, it will happily check if it's inside a string. No, but I mean, you would never run into a situation where you wanted to check is the number seven in this thing and you don't know that it's a string. You shouldn't, your data shouldn't surprise you like that when you're getting through the stage and you're processing down to a specific question, you do need to have an understanding of what your data is to ask it a question. It's like, if we didn't know that our laureate, that our Nobel prize is contained in an array called laureates, we'd be in trouble, right? So we know there's an array called laureates that we're searching. So yeah, you shouldn't run into a problem where you don't know what your data is because then you have a much bigger problem. You're not ready to write a query. Okay. So, okay, now I need to make sure I say this carefully and clearly. So, we can check if one item contains another with the contains function. Okay, I didn't say that very clearly there. Okay. Well, maybe it is. So this is a point where JQ is very powerful. The rules are different depending on what it is you're passing as the arguments. So we're going to start with the simplest case, which is the one that led to your mind, string containment. If the argument and the input are both strings, it applies the following rule. When the input being processed and the argument are strings, contains will return true if the input string contains the entire argument string contiguously, otherwise it will return false. I'm going to need an example. Okay. So let us echo into our JQ function as our input. I love waffles. All right, so we're going to echo I love waffles as a string. So notice that we have single strings in the terminal sense. So echo and then a pair of single strings says dear terminal, I'm going to give you a terminal string. And inside the terminal string, we have double quote, I love waffles, double quote, because that is Jason string now. Right, because what has to arrive to JQ is Jason, not terminal stuff, right? So we piped out the JQ and our JQ filter is contains open bracket, the string waffles, close bracket. So if the string waffles is entirely contained within the input, I love waffles, it should return true. And if you pop that into your terminal, lo and behold, I love waffles, contains waffles. So that reads perfectly rationally. Hard to explain, but it reads. I love waffles, contains waffles. True. Yes, exactly. It's a filter, okay. Now, it has to be completely contained. So if I say, I do enjoy the odd waffle and I pipe that to the filter, contains waffles. That is not correct because waffles is not entirely contained within, I do enjoy the odd waffle, because the S is missing. All of, well, you got to think of that as, you might as well have said peanuts. It's not the same thing. Waffle and waffles are not the same thing. Waffles are way better than waffle. At least twice as good, maybe three times. Right. And the other thing, so the word contiguous is what we computer scientists like, but I don't know if it's sensible to humans. It means all in one piece. It can't be split up. Yeah. So if I say, did you say pan space cake? And I check if that contains pancake, that will be false because pan space cake is not contiguous pancake. Yeah, that makes perfect sense. Okay, good. Why would that match? Yeah. Good. It shouldn't. And it doesn't. Right, okay. This is the foundation for the more complicated stuff. So string containment is the foundation here. So the next thing you can pass the contains function is an array. And what it will do with the array. So let me read my sentence exactly so I don't mess myself up here. When the input and the argument are arrays, contains will return true if every element in the argument array is contained in any element of the input array. Otherwise it will return false. The order does not matter. And it is not looking for equality. It is looking for containment. So it's recursive, right? So let's work through this with examples. So we are going to take as our input the array waffles pancakes apples. So we're gonna echo the JSON for the array waffles pancake apples to the JQ function or command which is going to have the filter contains the array pancakes. The types have to match. So even though I'm only interested in one thing, pancakes, I have to put it in an array because that's how contains insists you work. So let's look at a true or false. It's gonna be true or false, but let's look at a rule. So if every element in the argument is contained in any element of the input. So the argument has one element pancakes. So does anything on the input contain pancakes? Well, yes, the second element in the input contains pancakes. It is in fact, exactly the same as pancakes. So the thing I'm gonna have trouble with there is the syntax. So you've got echo and you've got the square brackets, waffles, pancakes, apples, all in double quotes. Great. That looks like an array to me. But then it says JQ contains and in roundy brackets because we always do roundy brackets with this sort of command. And then it has square bracket pancakes inside that. It's not an array, pancakes isn't an array. Pancakes isn't an element in the array. No. Well, okay, but it has to be an array because it has to be the same type. Bing, bing, bing, bing. So if you said JQ contains and just had quote, pancakes, unquote, that would be trying to say is this string in this array? And while that might make sense to us, that's not what JQ insists on. It has to have the same type. Precisely. Okay. Precisely. It looks screwy though. It looks a little weird, but we're gonna see in the later examples why it is the way it is. So in this case, I'm only interested in one thing. And so the question is, does any element in the input contain all the elements in the argument? There's only one element in the argument, pancakes. So yes, we have a match. True. Now, I made a point of saying, contains not equal to. So let us use the same input. So the input is waffles, pancakes, apples. Does it contain the array pancake without the yes? Yes, it does because there is one element in the input that contains pancake. So you did the opposite example when you did the string containment. You did, I do enjoy the odd waffle and then did JQ contains waffles. Had you done it the other way around it? It said, right. But if you had done echo, I do enjoy the odd waffles and then JQ contain string waffle. That would have been true. That should have returned true. So that kind of a string containment would be true just like the array containment is. In that case. Yeah, and the array containment is true because the string containment is true. So everything in the argument has to be contained in anything in the input. Yeah, that makes sense. I just wanted to make sure that was still making sense in the previous one because we kind of did the opposite one. Got it, okay. Yeah, okay. So that was with one argument. But the reason you're allowed to have an array is because the contains function will happily check multiples for you. Oh, wow. So we can ask whether our input array, which I still haven't changed, I still waffles pancake apples, does it contain waffles and pancakes? So we pass the argument array waffles comma pancakes. So now every element in the argument has to be in any element of the input. So waffles is in the first element. Pancakes is in the second element. So yeah, we have two corrects. Therefore, true. So those didn't have to be in the same order, right? Could it be pancakes waffles? Why, look at our next example. Pancakes waffles, true. I actually didn't, I didn't read ahead. Okay. There you go. QED, yes. So the order is irrelevant. If the question is, does everything in the argument contained anywhere in the input? Okay. And the other thing is, doesn't matter if, just like the order doesn't matter, it also doesn't matter if there's a gap, which is kind of sensible, right? So if the array is waffles, pancakes, apples, and I asked if they contain waffles and apples, well, yes it does. The fact that there's pancakes in between doesn't matter, it still contains waffles and apples. Just, you know. Okay, that makes sense, right? So if I then ask it, does waffles pancake apples contains popcorn? No, it does not contain popcorn. That is not at all surprising. But if I- You're saying waffles pancake apples, and then in your examples you're talking about, does it contain waffles? I'm just, in case anybody's hearing that, he means waffles, pancakes, apples every time. Yes, I do. Sorry. Okay. All right, so there's no popcorn in waffles, pancakes, apples, that makes sense. It's not contained in it. And then the other thing is, they all have to be contained. So if we take that same input, waffles, pancakes, apples, that we ask it for popcorn waffles, the answer is false, because while the waffles are there, the popcorn is not. Okay. So it does not contain popcorn waffles. So that is a ray containment. It is powerful. That's a lot of if-else statements in JavaScript and a for loop. Like that's a lot of faffing about in JavaScript, but that's really quite concise in JQ. Yeah, you called this dense at first, and I wouldn't refer to it as dense. I think concise is a better way to describe it. It's unlike regular expressions. That's dense. It's just everything just smashed together with no Englishy words in between. That's true. Yeah, JQ does contain actual words you recognize, which may or may not be your friend, but it does contain words, you're right. Whereas regular expressions is just symbols. Someone used to say it looked like a noisy modem. The character has gotten corrupted. So the last type of containment that is needed deep dive is dictionary containment. So if the input and the argument are both dictionaries, what does it do? So I'm gonna read this verbatim again so I don't time myself in knots. So when the input and the argument are dictionaries, contains will return true if the input dictionaries value for every key in the argument dictionary contains the value in the argument dictionary and false otherwise. So there's going to be key value pairs in the argument and we've gotta find all of those in the input or we're not happy. Okay, stop. Yeah, yeah. No, no, no, no, because I think, let's see. When the input and arguments are dictionary, contains will return true if the input dictionaries value for every key in the argument. So the argument is going to contain a dictionary. So it's going to have keys. It could be far fewer keys than the input. So the input could have 500 keys and the argument could have two keys. If both of those keys are in the input and if the values are contained within each other, then we're happy. The values are contained within each other. So if my dictionary has the key A, B, C, and I'm saying does it contains A1, well, then if the input dictionaries, let me say that more, let me say that. Let's do your example. Let's do my example, because I actually did work with that. Let's try to invent one. Yeah, so in order to do these examples because to stop the commands becoming impossible, we're going to use a JSON file as our input instead of echoing some stuff. We're going to use the same JSON file for all of our examples. That JSON file contains one top level dictionary which contains three keys. The keys are breakfast, lunch, and dinner. And each key has a value that is an array. The breakfast array is bacon, eggs, toast, waffles, and pancakes. The lunch array is sandwiches, rolls, baps, and wraps. And the dinner array is pizza, pasta, and burgers, which gives you a slight insight into my weekly consumption of food. And it does actually bring up an interesting question. Do Americans eat baps? Do you know what a bap is? I've never heard of a bap. Imagine a roll, but you make it a circle. Do you think whatever material you like, so you know the way you can have like a roll, a sandwich roll, it could be like a brioche roll or soft, or it could be a crunchy baguette. Imagine you take the same dough, but instead of making it long, you just make it a circle and put it in the oven. And then what was a roll becomes a bap because now it's round. That's it. I think we just call a brioche like is round. Oh, see for us, a brioche roll, a brioche is a brioche called as baps or rolls. A roll is long and thin, and a bap is round. So basically, it's the shape. Huh, okay. We apparently really care. What's up there today? We apparently really care what shape our food comes in. I have no idea why. Anyway. I thought about that a lot when you look at enchiladas and tacos and burritos and things, you know, you start going, wait a minute, that's the same ingredients just kind of mixed around and you took one thing out and added it over there. Well, this is why me as a foreigner has terrible trouble with Mexican food because it's like, okay, I understand it's corn and you're baking it, you're frying it, or you heat it on some sort of a hot surface without having a raising agent. And then sometimes you call it a taco and then sometimes you call it a, wah! Burrito. Yeah, I get very confused. Cause I'm pretty sure I call burritos tacos. I'm almost certain I do. They're delicious, so who cares? Anyway, we have our breakfast, or sorry, we have our dictionary. So let us look at how contains behaves. So the first thing I'm going to ask is a very simple question. Does our menu.json contain the dictionary breakfast colon array bacon? Okay. Okay, so describe the syntax. So the syntax, so contains takes as an argument a dictionary. So I'm giving it a dictionary with the key breakfast, one key breakfast, and the value is the array bacon. And so the question is, will that return true or false when I apply it to menu.json, which is our dictionary with breakfast, lunch, and dinner? And breakfast does contain one element in the array that's bacon, so that should be true. It should. So the rule says if every key in the argument dictionary, so how many keys are there in the argument dictionary? One key breakfast. Right. Yes. So then we have to see if whatever the value of breakfast is, is that contained in the value of breakfast that exists in the input. So we are now looking for an array containment. Does the array bacon eggs toast waffles pancakes contain the array bacon? Yes, it does. Therefore we can finally say true. I feel like you're saying something that I should really pay attention to and I can tell it's one of those things that's just slipping right through. I don't understand why can we keep focusing on the key. Obviously, if there's no key called breakfast, you've already failed. So why does it have to be, it's like, it has to have the key and that key has to have the value that you can find in the array of the input. Okay, no, you're not missing anything, but that is a very succinct way of saying what the rule says. Yeah. Okay. So if the key is missing completely, it will indeed fail. So you're already out. Yeah, you're already out. Now, the rule is containment, right? So we can also say, does our menu.json contain the dictionary breakfast colon bacon waffle? Well, the answer is still true because the breakfast in the menu.json contains bacon exactly as is, but it also contains waffle zz, but the rule is containment. So waffle, waffle zz, oh, we have containment, we still get true. Okay, okay. All right. So the rules for dictionary containment encompass the rules for array containment encompass the rules for string containment, right? That's why I did them in this order. Yeah. Okay. So now let's start to break some stuff. So if one of the keys in our argument is missing, it's just, no, you're not allowed, right? So the first thing we can say is, sorry, if one of the values in our input is missing. So if we say breakfast bacon waffle popcorn, there is no popcorn in menu.json. So we're immediately out. And the other way it can go wrong is if we are looking for something in the argument that doesn't exist at all in the input. So the final example here is contains the dictionary breakfast colon bacon, comma dessert colon cake. So the argument now has a key dessert. The input has no such key, fail. And that's what you said earlier, right? If the key I'm looking for does not exist, it should just fail. Yes, it does. Okay, so the first example you gave was you asked for an element of the rate to be popcorn within the key breakfast for the key breakfast. Now, well, I personally have had popcorn for breakfast before it's not on the menu. I'm really getting hungry and I got some popcorn waiting for me right now. But okay, so that's why the first one fails. And then the second one is you gave it breakfast colon bacon, sure, we got that one dessert colon cake. And even though you spelled it desert, I'm gonna fix that. Well, it doesn't matter. Either way, it ain't in the input array in the menu. Yeah, which is, which is a miss on the menu. That's for sure. Okay. For double reasons, yeah. Gotcha. Okay. So that's the case of where the keys just not even there fail. Exactly. And then the final thing we have is default containment. So we know what to do for strings. We know what to do for arrays and we know what to do for dictionaries. But I said, if the type is the same, there won't be an error. For all other types, it defaults to equality. So if you give it two numbers, it will check if they're equal to each other. Now that may cause a little bit of confusion because if you give it the number of 420 and ask if it contains the number 42, it will be false because number number, they're not the same. If you ask it if the number 42 contains the number 42, you will get true. If you ask it if the boolean false contains the boolean false, you will get true because it falls back to the quality check. And as perverse as it sounds, this actually works because I checked. If you ask it if the null value contains the null value, the answer is true because null does equal to null. Let me ask you a dumb question. So you had echo 420, JQ contains 42, it's false. What if it was echo 342 contains 42? It'd still be wrong? It's still be wrong. Yeah, because as soon as it's not a string, an array or a dictionary, we fall back to equality. Okay. But you just wrote, you wrote a quality check on the does false contain false and null contain null, you wrote a quality check but they're all equality checks. That is true. Yeah, I guess I was just sort of short hanging because in the first one I used numbers twice but I didn't do a true and I didn't do anything else. Okay, but I'm just saying if we change that to say 342, it's still false but for the quality reason. Yes, precisely, precisely. Okay. Now, everything we have learned now, if we swap the input and the argument, that is what inside does. Instead of the big one being the input and the small one being the argument, the small one is the input and the big one is the argument. That is literally the difference between inside and contains is that you apply the same rules but you swap where the big one and the little one go. The entire documentation for this function in the official documents are essentially an inverse version of contains. That is all I had to go on when writing these show notes. Essentially, an inverse version of contains and the only way my brain worked is to just mentally swap the input and the argument and then it works perfectly. So the subset is the input and the superset is the argument. So if we echo the string waffles to JQ and we give it the filter inside, I like waffles, we will get true because the input is inside the thing in our argument. Waffles is inside, I like waffles. It does read well. Thanks. So is waffles inside, I like waffles? Yes, it is. Yeah. Okay. And the arrays work the same way as well. So we can say the array with the string waffles is that inside the array waffles pancakes popcorn. Yes, it is. It also works exactly the same for objects. So, sorry, dictionaries. So is the dictionary breakfast colon pancakes is that inside breakfast pancakes muesli? Yeah, oh, and snacks popcorn waffles. The answer is true because it is indeed inside, you're right. The simple object breakfast pancakes is inside the more complicated object breakfast and snacks. Yada, yada, yada. And it falls back to equality just like the other one. So 42 is inside 42. False is inside false and null is inside null and that makes no sense. But null is a weird value. Because I brought it up, I put into the show notes my example of 342 contains 42 is false. I'm gonna add it to this one too to say, Perfect. 42 inside 420 would be false, right? It would, yeah, cause they're not equal. Okay. Right. So that is containment and in both of its ways. So it contains an inside or containment, depending on whether you want the big one on the inside or the outside. And that is very powerful. But the obvious king of the castle is regular expressions. And for us, that boils down to the test function in JQ. So the test function takes as its input a string and its argument is a regular expression. And if the string passes the regular expression, then the entire string goes through, or sorry, then we get a true. And if the string fails the regular expression, we get a false. So it gives us a true or a false. Now, this is where we need to have a little discussion first about how JQ does regular expressions. And I have good news and I have bad news and I'm gonna give you the bad news first so I can give you the good news last. So one of the things I adore about JavaScript is that it has a primitive data type for regular expressions. If you wanna write a regular expression, you put a forward slash, the regular expression, and then a closing forward slash, and that is a regular expression. Just like a number is digits and a string is something that starts with a quote and ends with a quote. So JavaScript can deal with regular expressions. They call it as a native type. That is not true for JQ. It's also not true for PHP and lots of other languages, which means you have to write the regular expression as a string, which is okay a lot of the time. But inside a string, the backslash character has a meaning. It says, I'm an escape character. Inside a regular expression, the backslash character has a meaning. So when you need to use a backslash in your regular expression, you need to double backslash everything because the string will see the first backslash and go, oh, you're escaping something. And it will deal with that escape. And then what's left as a regular expression is missing the backslash because it's just been taken away by the string processing. So if you wanna be left with backslash n, you need to have backslash, backslash n. See where we're going here? Yeah, yeah, okay. And I hate languages that make me do that. And I'm sorry to say that JQ falls into that category. So I tend to avoid backslashes in my regular expressions. And you can often sneak around them. Not always, but often. So I will go out of my way to avoid backslashes. Instead of saying slash D. So you have a hope of reading it later? Right, so instead of saying slash D for digit, I say open square bracket, zero to nine, close square bracket, because the character class zero to nine is a digit, right? It says with the double escaping. So I do all those kind of little tricks because I hate double escaping because I always get it wrong. How do you do it in line? Yeah, I have to double escape, unfortunately. I haven't found a trick for that one. So I know. Like I said, you can't always avoid it, but I minimize them because I hate them because they break my brain so bad. So that's the bad news. The good news is that the syntax that JQ uses is pro compatible regular expression, AKA PC or E, AKA the way JavaScript does it. Either way we're used to. So that is a nice bonus. At least the syntax is as we expect, even if you have to double escape some things. So now that we know how that works, let's go have a closer look at this test function. So the test function always uses the input to the test function must be a string and it must have at least one argument, which is your regular expression as a string. But inside in pearl compatible regular expressions, you can have these things called flags. So in JavaScript, we used to put the little flag after the last slash. So if we wanted case in sensitive, we'd have slash a regular expression slash I. So the I is actually officially called a flag. There are no slashes around things here. So the way we give a flag is by passing a second argument that is all the flags, which is usually just I to be perfectly honest, but there exists more if you want to go read the PC or read documentation, but honestly it's just I for most of the time. So if you want to be case in sensitive, you pass a second argument that is I and that's all there is to it. So let us do some examples. So if we want to match NIP address as an example, we first need to build a regular expression, which is going to be the argument that we pass to the test function. And an IP address can be, it's not a perfect representation of an IP address, but it's a decent one. It is zero to nine, one to three times, followed by inside brackets three times, a period followed by zero to nine, one to three times. Okay. That is an accurate representation of an IP address. I 100% believe you. It will allow nonsense IP addresses like 999.9999.99999, but nonetheless, it's one to three digits followed by a period followed by one to three digits followed by a period followed by one to three digits followed by a period followed by one to three digits. So it's not too bad. Okay. So anyway, we'll take that as given because this is not, you know, Bart teaches regular expressions again, which I would happily do because I love them. But anyway, so if we echo the string, so again, we're doing that thing we did before, echo single quote, and then inside it, we have the JSON string. So double quote, 37.139.7.12, and we pipe that to JQ, and we give it the test function with that lovely big regular expression as a string. We will get true because that is indeed an IP address. So if we want to match for words starting with the W, we have the much simpler, echo the string waffles to JQ, where we have test. And then our string is carrot or hat symbol W because that is the regular expression starts with W. And that will give us true. That one I follow. Yeah, nice and easy. Probably should have done those in the opposite order, shouldn't I? I'm pretty good at starts with. I pretty, my ability to do regular expression goes down the toilet after that. Starts with an end with, they're very powerful. If you just know those to you, you can do a lot. If we do waffles with the capital W and we run that to test hat W, starts with W, we get false because they are not the same cases each other. Waffles with the capital W does not start with the lowercase W. But if we want to be case insensitive, we can repeat those same two commands, but this time we give a second argument which is the I flag for case insensitive. And then we get two truths because whether we have the waffles, uppercase or lowercase, they will match against starts with W. Okay, and we used our semicolon to say we're gonna give you two arguments here. Yeah, as confusing as that is to us. And like you and I were saying offline earlier, that is going to confuse us all forever, but if we keep calling it out, we might remember next week, maybe. So let us take our wonderful new knowledge. I know. Let's take our wonderful new knowledge and let's have another visit to our big JSON file with Nobel Prizes. And let's see if we can't ask ourselves a somewhat arbitrary question that I made up. How many Nobel Prize winners have surnames that start with a vowel? It seems like a very arbitrary thing to want to know, but hey, let's figure it out, right? So starts with a vowel. Okay, well, the character class A-E-I-O-U will match any vowel. And if we want to be lazy and not have to double it up to include uppercase A-E-I-O-U, we'll just use the I flag to say case insensitive. We know that the carrot symbol means starts with. So when we put all that together, our regular expression is just starts with square bracket A-E-I-O-U, close square bracket. So that's not too bad. So how do we go from here? Well, we start off with our dot prizes, open square bracket, close square bracket to explode the top level prizes key into each of the dictionaries representing each prize. Then we go in there, we then take the laureates array, which may or may not exist. So we show the question mark in the end of it and we explode that. So two square brackets again. So now we're left with L. Now, our parallel lines have just been split again. So we now have a lot of dictionaries representing each laureate in each prize. And in there, we're going to use the select function to screen or filter those down to just the ones that meet our rule. So what is our rule? Well, inside the select, we say dot surname, pipe, test our regular expression, semicolon I. So the input to test is going to be dot surname and that is going to give us a true or a false. Okay. And then we pipe, so at that stage, we have a dictionary left of only the laureates, which have a vowel in their surname. And we would like to print out both their first name and their surname. So we just pipe that one last time to dot first name, comma, dot surname. And that will- That one is too fun. Right, well, it's too fun, but, but, but, but, we immediately hit an error. Null, null cannot be matched as it is not a string. Now, I've put this in the show notes exactly as it happened to me because this is an example of how you find dirty data. So I have assumed- Let me, let me, let me stop you real quick there. Barton, I was doing some buddy programming earlier and I looked at, I got the result I wanted, even though I had this null error and it was, it was looking for Andrea. And I said, why do I care that I got that null error because it got me what I wanted? He says, that's only because it got to Andrea before it hit the null error. Yeah. So you don't get to be like, I got what I wanted, I'm fine, you don't know that you missed something that was after that null error. Yeah, you got at least some of what you wanted but you don't know what you don't know. Yeah, right, right, right. Okay. So, so we ask for, for the first name and surname for people whose name starts with A-E-I-O-U and we get Pierre Augustini and Annie Ernox but then we get the null error cannot be matched as it is not a string. So this then immediately made me go, huh? And I already know that Laureates may or may not be there. So it already has a question mark. So then it's like, wait a minute, are there Laureates without surnames? Is that conceivable that there are Laureates without surnames? So I went, well, let us answer that question. So I wrote some JQ to answer that question for myself. And so the JQ I wrote to answer that question was dot prizes, open square bracket, closed square bracket, pipe dot Laureates, open square brackets, closed square brackets, question mark. So we now have all the dictionaries for all the Laureates and all the prizes as the input to our select statement. Dot surname double equals null is what I put into my select. And lo and behold, I got a whole bunch of entries and there's an example in the show notes. It says, why did you use double equals? Why didn't you do contains null? I could have done contains though, but double equals is like, I just want to know, are they null? And the answer is. I'm just excited to know we could use that. Yeah. So we end up with a whole bunch of entries like the one I have in the show notes, which are ID, 818, motivation for their efforts to build up and disseminate greater knowledge about manmade climate change and to lay the foundations for the measures that are needed to counteract such change. Share, too, so whoever this is shared it. And it says first name, intergovernmental panel on climate change. End of dictionary. First name. Who gives a prize to an organization and then uses the first name field? Like I was doing that on Excel sheet, I would either have a separate column, say organization name, especially given that we're using JSON as our data store, or I'd use surname, first name, intergovernmental panel on climate change. The only way they could have made this dumber is to say first name, intergovernmental panel, surname on climate change. That would have been even dumber. But other than that, I don't see how they could have made it any messier. So there really are prizes with no surname. And this gave me a fantastic opportunity to use the alternate operator, because, well, hang on a second. If there is no surname, I still want to do some searching. I just want to search on something else. I want to search the first name, right? Because there is no surname. So I would have thought you'd want to go, I just want to see the ones that are real people and they would have a first name and a surname. You could have gone that way, but that wouldn't give me the opportunity to make use of the alternate operator. So my brain immediately went to, well, the intergovernmental panel on climate change conveniently starts with a vowel. So I really should actually have it popping out as a winner that starts with a vowel. Am I thinking? So if we rewrite our thingy, our thingy, our JQ command exactly as we did before, but in the select, so we were saying select dot surname, pipe test. Well, we want to pipe the surname or the first name to test, and the alternate operator just does it for us. Dot surname slash slash dot first name, pipe test. Hmm, okay. So if we have a surname, it goes, well, we saw that in part A, which for us is the other side of Christmas, therefore ages ago. So that allows us to easily check our vowels against the surname or the first name. But I've also made use of the alternate operator a second time to give us prettier output. So I am outputting the first name and the surname, but if there is no surname, that would give me null. So it would say international panel on climate change null. That didn't look very nice to me. So I decided to use the alternate operator to output the string none with a star each side of it. To highlight how silly it is that these things have no surname. So you have your last pipe goes to in parentheses dot first name comma dot surname or star none star. Yeah. You could have written waffles. I could have written waffles or the empty string I could have written as well, which is actually the prettiest output. But to make the point, I've made it none with a star around it. Look at how silly this is. Hang on. Hang on. Hang on. I'm getting a very unexpected set of results when I copy and paste that command. Let me make sure I actually copied and pasted it. Okay. Copy this again. I don't know if I have a terminal open two, but I don't know if I do today. I do not have a terminal open two. Okay. Good. No. I hadn't copied. So when I hit it, it just did the previous one. Okay. Good. So I see Institute of International Law none. Savant Arhanyas. This is a terrible pronunciation challenge. So you know what else this does though? This does now bring in a bunch of people that wouldn't have been in your original intention because it's bringing in someone named Aichman Sigrid who wouldn't have been chosen in the first one. Hmm. Because his first name starts with a vowel, but his last name does it. His surname does not. I'm surprised he came through because he has a surname. Huh. Well, let's say Francis W. Aston, Albert Einstein. Albert Einstein. This is a perfect example. Albert Einstein. Well, he comes through. But Einstein would because of the E. Yeah. Tobias Acer, William Oswald. Let me make sure I'm reading the root of Jürgen Glas Pontus. Oh, he's got two first names. He's got two first names? Glas Pontus. Two words. Oh, okay. And then Arnoldson, Paul Ehrlich. Okay. I may be reading the, I'm sorry. It's really hard to read this because it's just name, name, name, name, name, name. It's hard to see what it is. I think that I did get those out of order. Okay. Good. Because I was pretty sure I had it right because I did exactly what you just did when I was doing this first and I had a panic attack that regular expressions didn't work like I thought they did. And then I checked and they, they do work like I thought they did. And this is actually, this is a good example of why I can't show you the pretty way to output this yet because I haven't told you how to transform Jason. Because what I would really want to do is actually to create a new string that contains the first name on the surname one after the other as a single output. That's what I would want to do, but that's transforming Jason. And that's the perfect setup where we're going in episode 159 because that is the final piece of our jigsaw puzzle. So we now know how to pre-print. We now know how to search. So transforming is the last use case and that's where we're going to next. Now you have two out of three challenges done. So I'm going to give you one extra challenge. So then you have two last time and two this time. So it all works out perfectly. So as an extra challenge, I would like you to find all the Laureates awarded their prize for something to do with quantum physics. I want the first name, surname and motivation for each winner where the motivation contains a word that starts with quantum in any case. It starts with quantum. It contains a word that starts with quantum. So the motivation could say a whole bunch of highfalutin nonsense followed by quantum physics, then it will contain a word that starts with quantum. Right? Okay. So it's word. So I have two hints to give you. The first is PCRE has a special symbol for saying this is a word boundary. So starts with is the start of the string. There is an equivalent for a start of a word boundary. So you need to remind yourself or Google the word boundary symbol in PCRE. And you need to be very aware of the need maybe to double escape things because maybe the symbol for word boundary has a slash in it. That's just mean Bart. It is just mean. Yes. So basically you are going to need to use the slashes because the weight of check for the start of a dash. Okay. All I need to do is have new years now. Yeah. It doesn't help really, does it? And I think I'm going to be at CES for a week before I'm going to be back to this. So I bet I'll be perfectly successful. Yeah. Because when you're at CES you're going to have nothing better to do in your hotel room at night than do some Jason querying with JQ, right? You're definitely not going to be on the party. There will be no gin and tonics there. Isn't it a Vegas? It is in Las Vegas. Yeah, okay. When we get off the air I'll tell you about the last time we were in Vegas with our friends for CES and I got to know the old fashion girl. Oh, as in the cocktail old fashion. Yes. I would love to get a really good old fashion because I have had old fashion but I don't think they're good. I have a feeling I'm missing out on like a properly made true old fashion with good bidders. There you go. Right, anyway. Sorry, I'm distracted completely now. So we have our optional challenge. And yeah, so the final piece of the puzzle here is to build new Jason based on existing Jason. And that is generally the final step in our pipeline, right? So you have lots and lots of input. You find the bit you want. So you're narrowing down, narrowing down, narrowing down to get the piece of information you want. And the final step is you actually want to present that information in the way you want, which means you need to construct new Jason, which is transforming Jason. So that is the final piece of this puzzle here. And that allows us to put our code in between things because you can get an API that fetches weather data in all the wrong format and is all dirty and ick. And you just want a nice clean piece of data that you're going to use inside a terminal command or for something else. You need to build a nice new Jason that contains only the keys you want, none of the other garbage, no dirty data, everything perfect. The universe is great. And that is where we are going in two, three weeks or whenever. So yeah, there we are. That sounds fun. Yes. Very good. I like this episode. Most of this made sense. As soon as you start saying regular expressions, I lose my mind. But other than that, I've fallen along with you. If all you ever do is starts with, you will still be able to achieve a lot. Or if all you ever do is go to Stack Overflow and say, I want a regular expression for a valid email address and copy and paste, you're still flying. I actually, that might be a really good use of chat GPT. I bet it would just generate me a regular expression. Yeah, it should be able to learn those patterns. Oh, that's a pattern matching machine learning how to make patterns. I like it. Well, when I talk to you next part, I will talk to you next. Indeed. Oh, that's circular logic too. Right. Anyway, whenever that is, until then, happy computing. If you learn as much from Bard each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him. He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions. If you go over to let's-talk.ie, you can support him on Patreon. You can donate via PayPal, or you can use one of his referral links. I really hope you'll go over and help him out. In the meantime, you can contact me at PodFeed or check out all of the shows we do over there over at podfeed.com. Thanks for listening and stay subscribed.