 Well, it's that time of the week again, it's time for Chitchat Across the Pond. This is episode number 781 for December 9th, 2023, and I'm your host, Alison Sheridan. This week our guest is Bart Moushatz back with Programming by Stealth, installment 157. How are we doing today, Bart? I am doing fine, 157, isn't that amazing? It really is. You'd think I'd know everything by now. I don't, I might work at this five days a week every week and I'm still nowhere near knowing everything. It's impossible. I was joking with my boss and I'm trying to document myself out of existence, but I'm not even keeping up, let alone catching up. I like it, I like it, because you're learning faster than you can document. Yeah, those things change quicker than I can write them down. I do more stuff that I could ever document ever, but nonetheless, it's still important to document stuff when you can because, you know, you don't want to make yourself the single point of failure, because then you can't get sick and stuff. Or get promoted. Or shut off for a month of annual leave, or get promoted, exactly. Well speaking of documentation, we have some good show notes. I've actually already been through them. I'm like teacher's pet this week. I know what we're going to talk about. I'll still get lost, but we'll give it a go. So we are on our third, I think, installment of JQ, which is a language for querying the Jason Markup language, I guess. It's a way of writing data in a structured format. And so we started off by explaining the big picture and learning how to use it to make our Jason look pretty, which is already valuable because a lot of APIs spit out one giant big long line of GLOP, which is very difficult to understand, but JQ makes it pretty. And then we learned how we can start to, I would say, extract very specific pieces of information last time. So we sort of made a surgical strike to exactly the point in the data structure we wanted. But it wasn't searching. It was like going to an address instead of finding something because we knew that we had to go in. You know, the key named this and then the fourth element of the array and then the key named that. There was no find me something like this. It was go exactly here. Right. Right. So there gets the difference between sat nav and googling for the best coffee in the neighborhood. Good analogy, right? So today we are moving towards our aim at the end of the day is to be able to do that kind of a querying to be able to ask. So we've been using as an example a really fun data set I found, which is information about all the Nobel prizes as a Jason file. And so we've been using that in our examples in the previous two installments. I'm going to keep using it this time and next time and the time after and the time after that. I've been planning and we want to be able to answer very humane questions by the end of this installment like who won the 2000 Nobel Prize in medicine? Which prizes were won by people with the surname Curie? We want to query the data for those kind of things. What I think it's interesting about the way JQ works is what my brain just heard was an if then else sort of thing. Like if year equals 2000 or if surname equals Curie, but it doesn't work that way. You're not doing if then else statements in this. It's basically it doesn't as a series of when they call them filters, right? The language of JQ is they are filters. So what the thing we've been writing is called a filter. And when we used a comma, we could do two filters one after the other. And what we're going to learn in about three minutes is that you can make the filters talk to each other like in the command, like in the terminal, you can chain multiple simple commands. JQ is about taking filters and connecting them together. And you basically plumb your problem as a sequence of these filters. And each filter is nice and simple and easy to understand. And the magic comes from how you connect them together, which always reminds me of our good friend, Tim Verporten, who used to say that, you know, those little laps in your menu bar did one thing and did it well. Well, a JQ filter does one thing. And if you try to make a JQ filter, do lots of things, you will cry, achieve nothing and get very cranky. Because this is what I used to do. And now I have completely embraced the idea that you just have lots of filters, you chain them together, you connect them together. In a lot of different ways. And that's where the power comes from. This all comes down to your love of Michael Briggs, doesn't it? Yeah, it does. Totally. To be fair. So the difficult thing about this installment is why I've kind of been holding it back until our third installment is that in order to get from anywhere to here, we need to learn three things at once, which means this has to be an episode with three new ideas no matter what I do. And that's I'm always worried when I have to do three things at once. So we need to learn that we can chain filters together like we do in the terminal with terminal commands. We need to learn that there are something called an operator. So in the JavaScript language, we know we have plus as an operator. It takes whatever's on the left, whatever's on the right, adds them together and makes new value. So that's how an operator does things, right? Whatever's on my left, whatever's on my right, I do something. But JQ also has operators. We're going to meet some of them in future installments. But the ones we're interested in today are the ones for doing logic-y stuff, less than equal to, because how do how are we going to query data if we can't express that we want something the same as or something greater than or something less than, right? So they're obviously very important operators. And JQ doesn't have operators for everything because it doesn't make sense to have an operator for everything. There's not enough symbols on the keyboard and you'd forget them for a start. So JQ takes the rest of its functionality as functions. We've done enough programming between the shell script and JavaScript not to be surprised that there are functions. And so JQ does indeed have functions. So we need to learn how to chain lots of filters together, use operators and use functions. And then we can answer those very simple questions I just gave you. So let us dive straight in with filter chaining. So I told you at the very, very start that I was going to keep telling you that whenever you're writing the JQ filter part of the JQ command and the terminal, always single quote your filters, because otherwise it is not a case of when you will get caught out. Or sorry, not a case of if you will get caught out, but when. Because JQ's syntax learns or copies the shell. So if you put it in single quotes, you're saying to Bash or ZSH, this isn't for you. This is an argument you are to pass completely unaltered to the JQ terminal command and it shall interpret it. And the biggest reason for this is because the symbol that JQ uses to connect one filter to another is the pipe, i.e. exactly the same symbol for Bash. Why do I get the feeling that you've been caught out more than once that you have to keep telling us? Oh, while writing these very show notes. What? That error literally doesn't make any. Oh, that's not an error from JQ. That's an error from Bash. OK, why did Bash? Oh, yeah, all the time, all the time. So you will. I keep saying it over and over again. Yeah. So on the terminal where you say that standard out becomes standard in. Well, I told you last time that in JQ world, each JQ filter takes one or more inputs, applies itself one after the other after the other to each of its inputs and will produce one or more outputs. And it doesn't have to be the same number. So when we use the two square brackets to explode an array, we took one input, an array, and that filter spit out lots of outputs. Or when we use the syntax for slicing an array, we took one array in and got, say, the first three elements or the last two elements or whatever. So it can be an N2N or an N2M, I guess, because both numbers can be different. And as we start to use functions and things, the amount of transformation that can happen within a filter becomes ever greater. But at the end of the day, there is going to be an amount of inputs. They're going to be processed in parallel by this filter to produce an amount of outputs and they will become the input to the next filter in the chain. And the data ripples its way through. And each filter transforms the data in some way. Maybe it pulls a small piece out and throws the rest away. As we learn much later, not today, you can do math and stuff with the data. So you could take in a giant big array and spit out four, which might be the average number of Nobel laureates per year or something. That's probably not true, I just made that up, but you can do that kind of thing. So your filters can do anything to transform the data, but it's an amount of data in, I'll do myself once for every input and I will produce an amount of output. And then the next filter does the same and the next filter does the same. On and on and on and on you go. And so that pipe is very much your friend. So we have already learned in the previous installment that we can use the comma symbol to basically, don't think of it in your mind as and, think of it and also. That's sort of a bunch of, I've been trying to think of a piece of English to say that doesn't sound like it's a logic thing because when you say dot year comma dot category, you're saying I want the year and I also want the category. So they're basically they're two filters. It's like a list. You're saying give me this, like a list, exactly. Like I want the year, I want the category, I want the blah. Now they can be anything a filter can be. So as we start to learn that we can call functions, each one in the comma list could become something really complicated, but the comma just means and also. Okay. So the thing is that the commas are less, they don't, the pipe takes precedence. So if you have something pipe, something comma something, the pipe happens first. So in the example in the show notes, we take the prizes and we explode it and then we pipe that into dot year comma dot category. Well, it's not that you get all of the years from all of those explosions and then one dot category comes out. No, no, no. The pipe happens first. So both year and category come out for each and every single prize. Okay. So it finds the first prize, it gives you the year in the category, finds the second prize, year in the category, third prize, year in the category. Exactly. Exactly. So when you run that filter, when you run that JQ command, what you will see is year category, year category. So that will basically give you the listing of all the Nobel prizes that have ever existed, right? The 2002 prize for physics, 2002 prize for medicine, 2002 prize for peace, yada, yada, yada. So they all come in one after the other. And it is important to understand that the pipe happens first and then you have the and also, which is what the comma gives you. Okay. The other thing, as you start to build these things up to be more complicated, you're going to end up wanting to group your filters because you may want to do one big thing and then do some little piping around with lots of things to make one final answer that gets piped to somewhere else. So you may need to decide to group your pipes because you may want, yeah. So you may not want it all to go the one way. So I did my best to find a way to show you that in the show notes, slightly contrived example, but it's an interesting thing that there were some years, actually, okay, let me, getting lost on my own show notes here. So let's go back to our example. So we take dot prizes, open square bracket, close bracket that explodes the prizes array into lots of separate things. That thing gets piped into dot year comma dot category. So it will come those tuples and everything, which is great. And the roundy brackets are for grouping things together. So if we wanted to make our query above a little bit better, so don't just list the year and the category. We also want the surnames of each recipient of each prize. Well, now we can't just pipe that straight through because the surnames aren't there at the top level next to category and year. They're inside an array of laureates. Oh, that's sub to dot. So there's prizes and then laureates are inside prizes. Exactly, so the prizes contains a dictionary which has a year, which is just a number, a category, which is just a string and an array called laureates. And then in that array called laureates, you have a dictionary which has surname and first name and also why you got the prize, which is called the motivation. So if we want to get the surnames out as well, then we're going to have to explode the laureates as a third thing. So we take our existing query and we put another comma to say, and we're going to do something else. And the something else has to first explode the laureates and then get the surname. If we don't use the parentheses, the pipe will say the way JQ would see it is you want the category, the year, the laureates and then pipe all of that to dot surname, which will fail spectacularly because 2002 is a year, you try to get the dot surname of 2002, it will tell you that's poop, right? You try to get the dot surname of an array, it will say, no, that's not the index in this array. So that's just, that's nonsense. So we need to tell JQ, no, no, no, no, no, I want you to explode it out, get the surnames and then take all of that as the third thing. So can I describe it now how it's written in text? So he's got JQ and of course we're gonna be in single quotes because we don't wanna be talking to the shell. And so first he explodes dot prizes and then the array dot prizes and then he pipes that and the three things he wants to know are dot year, comma, otherwise known as and also dot category, comma and also and then the third thing that he wants, he's gotten parentheses, which is the array dot laureates exploded pipe to dot surnames. So that thing together says, go into the laureates, pull me all the surnames and make that be the third thing. So dot year, dot category and dot surname and then the file name. And that one third thing becomes a list. So what actually ends up happening is that you see 2002 physics, three or four names, 2002 chemistry, three or four names, 2002 peace, three or four names and back, back, back with back. So it's still happening in order but the third thing becomes many things because there are usually more than one laureate. Now I found while trying to make this example, the perfect excuse to show you something I told you last time. So last time I said that sometimes something doesn't exist and JQ gets very, very cranky and says, well, I can't go into that array. It doesn't exist. And if you want to make it not give an error, you just put a question mark after it. And when I tried to run that very innocent looking query without that question mark on dot laureates question, dot laureates square bracket question mark, I got an error, which made my head hurt because that implied that there were years where there were no Nobel prizes. There are years where there were no Nobel prizes called the first world war and the second world war. They didn't give out prizes. They gave the money to charity. And so what's in the data set is an entry that has no laureates and a note saying, this year the prize money was donated to blah, blah, blah. Oh, that's interesting. What's the motivation? Well, so the motivation is the explanation of what it is they gave it a charity to. Okay. But they called it overall motivation instead of putting it inside the array of laureates. So there is no dot laureates when the years where there are no winners, which I didn't know until I ran a query that blew up. And then I put the question mark in, then it was well again. So it just proves that the question mark works to stop errors. It just will ignore a year where there are no laureates. They're just not included. Interesting. Okay. The reason I'm pausing is because when I read through these notes the other day, I thought I tested it without the question mark just to see how annoyed is it. And I thought it came back. It will give you some output. I thought it worked, but it just said null for a bunch of them. I thought that's what I remembered seeing. Okay, that's not what happened to me. What happened to me is I got a few of them and then the first time I had a gap it stopped going any further than time. Oh, okay. And the list is truncated. No laureates cannot iterate over null. Okay. That's what I would have thought. Yeah. Okay, that makes more sense. Yeah. Because what you've told it is I want all of the entries inside null and it's gone and went, there are no entries inside null. That's a nonsense statement. How do I get the list of nothingness? Yeah, it'd be nice if it told you weird. It says at NobelPrizes.json colon zero. Well, I bet it was. Yeah, so the entire, well, you see the NobelPrizes.json file has the JSON on one giant big line. So it is giving you the line number. The problem is the entire file is one line. Ah, well, that's not very helpful. Also, they gave it to you in computer-centric line zero. Who the hell thinks of line zero? Right, right. Okay, so let us move on to our second, that's our first piece of information, which I'm hoping wasn't too bad. We can chain filters together and we can group them with parentheses. So far, so not terrifying, I hope. So second piece of new information, operators inside JQ. This is very much in keeping with how other languages work. The operator goes in the middle and then you have a value on the left, a value on the right, and the whole lot gets replaced with a new value calculated by the operator whatever, you know, whatever the operator says in the tin is what will result. Okay. And some languages allow you to have things like unary operators that take only the input from their left and don't expect anything from the right like plus plus, for example. In JavaScript, we can say X plus plus. That's an operator that only has one side. JQ, way more simplistic. Operators have two sides, full stop and a story. There is no unary stuff, which is interesting, but we'll come to that shortly. Now, this is a moment where I need to sort of, I initially wrote the show notes and left this without because it was so obvious to me, I'd forgotten that I didn't even think about it. It's not that I decided not to include it, it's that I just walked right by it and never saw it, but actually I made an assumption that I shouldn't have made. So everything we have done up to now, we have told JQ the names of things, dot year, dot category, dot laureates, right? They are names within the data structure. We haven't actually told JQ a value. We haven't told it true or waffles, right? A string, right? We have only given it names of things, but if we're gonna do a comparison, is the year greater than 2000? Well, we now have to specify a value, what are we being greater than? So we should look at data types. So when we learned JavaScript, we spent ages learning that there are different types of data and that you write them in different ways and they have different meanings. And I just forgot to include that minor fact in these show notes because I'm too used to this and that's why I have you. So let us start by taking a slight step back and reminding ourselves of the JSON syntax because JSON is storing purely values and so its syntax is all about how do I express a value? So JSON actually has one, two, three, four, five, six data types. The first data type is very confusing. It is an explicit piece of data to say I mean nothing, only to mean nothing explicitly and that is null. So null is, there is one value of type null, it is the value null and it means I'm not here. So it doesn't mean zero. It is a data type. It doesn't mean nan. It means null. Exactly. It means null. It means nothingness as a thing. It can go in an array. The first element of an array can be null. Well, in 1947, the prize went to null. Yeah, the lariots were null because there is no lariots, therefore null. Yeah. Right. The other data type JSON understands is booleans and they have the two keywords true and false. A boolean is either true or false and so you write it in JSON as T or UE or FALSE. JSON also understands numbers and we write those in the way we Westerners write numbers. So digits, perhaps with periods, perhaps with a minus sign. We just write the number. Not quoted, we just write the number. That makes it a number in JSON. Then we have strings and JSON is very strict on string. So in JavaScript land, we could use single quotes or double quotes and it didn't make any difference. JSON is way stricter. It's double quotes or nothing. It's not, it's just double quotes, no choice. JSON then has a syntax for an array which is open a square bracket, one or more values separated by commas close the square bracket. And any value can be in an array. So you can have null, false, 11, the string something. They can all go in an array and you can put an array in an array and infinity ensues. And the last thing you can have in JSON is a dictionary which is open curly bracket. The name of the key as a string colon the value which could be a string, it could be a number, it could be an array, it could be null, it could be another object or sorry, another dictionary. So you can have dictionaries all the way down and arrays and you can nest them together and add infinitum and hey presto, you have a database of noble laureates or whatever you'd like, right? Those six atoms do it all. So they are the six data types in JSON. JQ is a querying language for JSON. Thank goodness, the authors of JQ didn't try to reinvent any wheels. If you would like to represent null as a value in JQ it is N-U-L-L. If you would like a Boolean it is T-O-U-E or F-A-L-S-E. If you want a number it's the digits, perhaps with a period, perhaps with a minus sign. And if you want a string it is double quotes or go home. It is, right? So the rules are the same. Phew, so that now means that we can tell, oh that's a number, that's a string, that's null, that's a Boolean because the rules are the same as they are in JSON. So that is important for us to remember. And the structure and it's clean. Yes, yeah. JSON is a subset of JavaScript that is very strict, right? The J in JSON stands for JavaScript. Okay. It's JavaScript object notation, JSON. Oh, okay, didn't know that. There you go. Okay, so the next thing is in order for me to show you JSON operators in action I actually am going to take a value, an operator and a value, which means that the input to the JQ command is nothing. I just want to run a filter that has a value, an operator and a value, and I want to see what that evaluates to, which is perfectly valid thing to do. But JQ kind of assumes that its job is to process data so it kind of expects to be handed some. So its default behavior, if you don't give it any data is to assume you're going to type it on the keyboard and it sits there and waits for you to type, which is not actually what we want. So you can tell it, no, no, no, I really didn't mean you not to have any input with the long flag minus, minus, null, minus input or the much more convenient a minus N, which in my head I think I was minus no input but it's technically null input. So if we want to see the operators in practice we can say JQ minus N and then we can just run a filter that to values and operator, et cetera. So from our point of view today the most important operators to learn about are the comparison operators because we're going to compare things to each other and the comparison operators always return a boolean. So if you say is this equal to this, it will be true or false. If you say is this not equal to this will be true or false. Is this less than this, greater than this? It's always going to be a boolean. The only output that comes out of a comparison operator is true or false. And we have all of the operators you would come to expect. We have double equals for is this equal to? We have exclamation mark equal to or bang equal to depending on which side of the Atlantic you're from for not equal to. We have a single chevron for less than the opposite chevron for greater than and we have chevron equals and the other chevron equals for less than or equal to and greater than or equal to. That's funny you're calling them chevrons. I think of them as the less than or greater than symbols. I know. So do I but I couldn't say that because the less than symbol for less than sounds a bit silly. Or the angle brackets as I also sometimes call them. So if you look in the show notes you will see all of the commands and I have put in a comment at the end of the terminal command the output it produces. So if you say JQ minus N and then in single quotes waffles as a string double equals waffles as a string shock and or horror JQ will tell you true waffles do indeed equal waffles. If you say waffles as a string double equals pancakes as a string JQ will quite correctly tell you that they are not the same false for not equal to very sensibly waffles is not equal to waffles returns false waffles is not equal to pancakes returns true because waffles are indeed not pancakes less than works the same greater than greater than or equal to they all do exactly what you think but but but there is a subtle T to double equals. So if you remember all the way back to JavaScript and installment I don't know 14 or something I actually think it is genuinely that long ago we learned that in JavaScript we have two forms of checking for equality we have double equals and triple equals. Yeah the double equals was like the strict like is identical to right. So if you say you is equal to the string 42 is that the one that double equals would be true but triple equal would be false. Yes yes that's it perfectly in JavaScript. Okay. In JQ you are always in the strict mode. So in JQ double equals means what triple equals means in JavaScript. So like you were saying JQ is a very strict language well its equality is strict. So if you say JQ minus n the digits four two double equals the digit two quite rightly that's false if you say JQ minus n the digits four two double equals four two quite yeah true great if you say JQ minus n the digit four two double equals the string four two false they are not considered the same. Okay. They subtlety. Okay. And this is important for us because in our data set the years are strings because the people who and I actually downloaded it from the Nobel Commission's website from the Nobel community's website the data set comes from the actual Nobel people and they aren't very good data scientists because they have they have encoded their numbers of strings but anyway. Well they did try they started a long time ago to be fair. I don't think the JSON data set goes back to 1901 she was just a hunch but maybe that's how it was typed out or something on paper parchment or something. For the IBM punch card actually they probably existed back then you know international business machines were a back then anyway the second type of operator I want to share with you today is the obvious companion to the comparison operators it is the Boolean operators. So basically this is your and and your or and your not only there is no not operator in JQ because not is a unary right not doesn't take two inputs that doesn't make sense you not a single thing. Oh right. So not is not an operator but it does exist which is going to be our sort of our transition point of functions. So we have and and or available to us as our Boolean functions and they will produce true or false and this brings us to another thing that every language has to grapple with and every language gets to make up its own rules. So if you compare a Boolean to a Boolean then Boolean logic is the only thing in play and it's really obvious what will happen everything just obeys the truth table George was he George or Robert Mr. Boole in Cork Ireland invented about a century ago. They're very fond of Mr. Boole down in Cork and they have lots of buildings named after him down there very pretty campus anyway Boole gave us the rules true and true is true true and false is false false untrue is false but we have these simple rules so as long as you're dealing Booleans all the way down it's all easy but if you say true and 42 or true and an empty string the language has to decide how to handle this some languages handle this by shouting at you and giving you a giant big error and telling you that the Boolean operators can only handle true and false. Don't you dare give these Boolean operators a string to work on most languages are more forgiving and they have a set of rules that says if I'm forced to treat something like a Boolean that isn't true I will apply this algorithm to figure out what that is is it true or is it false and so in JavaScript we spent a lot of time on this because JavaScript rules are quite they're complex but useful so we use the term truthy and falsy for things that we converted to true or false without ever explicitly writing them as true or false we said that the number 42 is truthy because it evaluates to true and the value has a much much stricter approach to these things the rule in jq is so simple it sounds complicated every single possible value apart from false and null are true everything that isn't FALSE or NU double L is true that's very counter intuitive because that means an empty array is true JavaScript no JavaScript they were false it was false we would say if the name of the array and it would be false if it was empty really useful an empty dictionary JavaScript was false which was very useful jq no it's true because it's not null or false so it's true the empty string true JavaScript called it false the number zero everywhere in computer science called the number zero false not jq no it's not it's not FALSE or NU double L it's true so everything that isn't null or false is true and that is very counter intuitive right right and you do lose some utility with that like you said right I argue you do yeah now there are functions we will learn about next time for checking all sorts of things that will get us around these strange choices it's clean though you gotta admire the simplicity right it took me a whole episode to explain how JavaScript does it it took me like a sentence that's so simple it's deceptive because you just it can't what do you mean it's that simple it can't be how can zero be true it's not false or null I'm gonna pull back the curtain when I read the show notes about three or four days ago I was writing to bark on this doesn't make any sense what are you talking about and he kept saying the same thing over and over and over again I'm going yeah no that's that's not my question and you're like no that is your question false and null that's it FA but you can't say FA LSE when you're typing because it's already that way and it took us probably six times before we would I went oh we mean exactly yeah yeah exactly you had it's too simple it can't be right you fleshed this out quite a bit to make that as clear as possible so if anybody's reading this and not having Bart describe it and say it in these words it's it's this is really good now now yeah and so we have commands in the show notes to prove what I've just said to you so again we're using our jq-n so we can say true and true is true true and false is false so far so good true and null false yay true and 42 true true and zero true true and waffles true true and false as the string so the string double quote FA LSE that's a string true the empty string true if you want to test an array or something we need to get a bit cleverer so instead of me using jq-n I went back to the old thing we learned a million times when we were doing our various things in shell script echo the string I want to pipe it into the jq command so I am echoing the JSON syntax for the array false comma zero common no and I am piping that into jq and jq says true you know so I give it to jq string true and dot dot representing the input so dot is our array and jq says yep that's true because that array is not false and no and no are true yeah also the empty array true right we can pass it a dictionary breakfast colon pancakes dessert colon waffles which I think is a good day pipe that to jq it says true you know the empty of the empty dictionary true right everything that's not null or false is true it's shocking but it's true if you look huge the terrible pun right so our third new thing functions in jq so in jq like in any other language functions can take multiple arguments but actually I should step back a sec because in most languages we're worried about telling the function any amount of input but in jq you get input for free because a jq filter processes something right dot is the input being processed so every jq function has access to dot without you doing anything so you don't have to explicitly tell it to work on dot that's that's a given so arguments are only needed um this is why we use video the only reason he knew to say uh say go on and see some go on wait a minute wait a minute so we you said every function has access to dot is that even like after a pipe would dot still be the original input or would it be the input from that before that pipe or right so dot has always dot exists within a specific filter so when you pipe one filter to another each each side of the filter has a different dot so if you that's why dot prizes I like that and then dot surname right right okay exactly okay yeah so the function just automatically is processing your thing which kind of makes sense right you're taking some data running it through a filter to get some more data so you don't actually have to explicitly say have the data I want you to process it just just gets that for free right just by the nature of jq's raison d'etre so a lot of functions don't need arguments like if you want to not something you just pipe it to the function not and then it will flip it around so it will say are you already a boolean if you're not a boolean I'll make your boolean and then I'll turn your then I'll flip you around so if I pipe you null I'll get true if I pipe you false I'll get true and everything else in the world the opposite of true is false I'll get false right so a function that doesn't need any more information you literally just give it its name and that's it and it will work on the current input and do what it says so not you can just pipe not that's the entire filter the entire filter is just the entire function or is it well no so you use functions within a filter right the filter is like on the terminal everything is a command okay okay in a programming language everything is a statement so this is jq everything is our filter okay so this is jq minus n quote a single quote true and true we're going to pipe it to not which is a function that is going to be our filter exactly single quote so that function is our filter true and true pipe to not is false exactly because it's now just been inverted right true and true is true pipe it did not false I got a lot of traction on mastodon when I posted that my favorite thing about programming is when you have something like this that makes complete sense jq minus n true and true pipe to not false and I went I understand exactly you got a reply from one of the authors one of the people who contributed to the jq project on gist which I thought was very nice that was really really cool yeah and luckily I put in my in my post I put learning from Bart and tagged Bart on it and I put a link to pbs.bartifisher.net and that I think he said something about oh something on jq I didn't know about was out there so he made out of it so that was really fun yeah which I guess from his point of view hey there's people learning my thing that's got to be fun too yeah yeah circle of happiness exactly so if you do need arguments if you do need more information like maybe you have a function to add a string onto another string well then the other string is not going to be your input so you're going to need an argument so sometimes you do need arguments and so you can give a function arguments and thankfully jq has inherited most of the syntax from every other language we've met it uses roundy brackets to say here's my argument list unfortunately the people who wrote jq painted themselves into a wee bit of a corner because normally the comma symbol represents this is the next argument but the comma symbol is already in use it's how you separate multiple filters from each other and a filter is a valid argument a function in jq can take a filter as an argument which is one of those snake eating its own tail things but it's the reason jq is tremendously powerful you can pass a filter as an argument it's like a call back in javascript right that's a function as an argument to a function well in jq it's a filter as an argument to a filter you've you've completely lost me by the way because you said that functions were filters and then you said and I'm the snake eating my own tail here I didn't follow that okay so a filter is a thing you want jq to do calling a function is a thing jq can do so a call to a function can be your filter yes your filter could also be dot name that's also a filter right so you call a function within a filter like you call a javascript function within a javascript statement but they're not synonymous a statement and a function are not the same sure so you could say one plus call a function and then the answer would actually be one more than you thought because the filter is one plus the function call so the function call doesn't have to be the whole filter okay let's keep going and go through your example yeah let's keep going probably make sense some of this stuff won't make sense until next week because I'm saying a lot but it is important to say that they couldn't use the comma the point I'm trying to get to is the comma is taken they have given the comma a meaning so to separate our arguments we use the semi colon that's very different to every other language we've ever met semi colon means next argument not end of statement next argument very different to anything we've ever met so that's why I'm making a real point of calling it out separate your arguments with the semi colon okay so we basically say name a function all by itself or name a function open round bracket one argument close round bracket or name a function open round bracket first argument semi colon second argument maybe semi colon third argument as long as we like close the roundy bracket and as we've already talked through our first example is the not function which is just pipe something to not and it will just invert it that's all there is to it now not is already quite useful but jq has two really powerful boolean like functions any and all and you will end up using these a lot that seems like an obvious thing like if I do I want to make a smart folder in apple photos I'm going to use the little drop down to change any to all those are that seems like a classic filter thing to do it really is so the any and the all functions both of them come actually in three flavors I'm going to teach you two of the flavors now on one of the flavors we'll come to later so the first flavor is the most simplistic flavor no arguments and the input to any and all has to be an array so it would they work at the whole point is that they work on many values you think about it that's kind of their job but there are many values in a dictionary but they can't do key value pairs okay maybe I shouldn't be quite so category for today let us pretend that the only thing they can handle is arrays missing yet they can but they do yeah for now let's just keep it simple arrays and so in this most basic form without any arguments if you give an array it's going to convert every value in the array to true or false based on the rule we learned above if it isn't false or null it's true and if all of them evaluate to true then the all function returns true otherwise the all function is always false right if even one of them is not true the all function will return false the any function is its opposite number if any one of them is true then the any function returns true so they do what they say on the tint and we can see that in action with some sample calls to echo and then we pipe at the array false comma false comma false and then we pipe that to the two filters any comma all so we're saying run this filter and also this filter because that way I have half as much typing to do so if we do that false false false we run it through any we get false we run it through all we get false false true false and he says yeah that's true I got one true that's I'm good but the all function like no one of those there's two of those are false no no no if we send the true true then any and all will both be true that makes sense you know they behave like the say on the tint so that's a zero argument version and that is already quite useful but where we really hit some power we have a two argument version and this is where I get to explain what I mean by you can pass a filter as an argument so the one argument version of all expects to be handed a filter it will apply the filter to each element in the array and it will do its final decision based on the result of the filter so you're applying the filter once for everything in the array and then the output of that filter is what you then any or all so it's an extra level of interaction so let's look at an example because this is way harder to say than to just do an example okay so our example is going to use all with the argument dot greater than or equal to zero so the argument is the filter dot greater than or equal to zero the input we're going to send is the array 42 comma 3.1415 comma 11 so what will happen is 42 will be compared to zero to produce a Boolean is 42 greater than or equal to zero true 3.1415 greater than or equal to zero true 11 greater than or equal to true so we have true true true all sees true true true returns true okay let me talk through the syntax here for the people listening it says echo and then single quotes around our array 42 comma 3.1415 comma 11 then he pipes it to jq and then again in single quotes all roundy brackets that tells us these are arguments dot which is the input that we just got which is that that array greater than or equal to zero close roundy brackets now in this case because of because the documentation of all tells you this you can't tell by looking you can only tell if you read the manual dot is not going to be the array dot is going to be the first element then the second element then the third element because with the documentation says that all will iterate over its input okay I guess that makes sense I was kind of looking for you to say the word explode because we didn't explicitly explode the array but it's going to do that iterating yeah exactly so the documentation for all says that it will iterate so we don't explode it all explodes it for us okay and then gets a list of booleans and then does the all thing so it's one level of interaction right right we told what we want to do what we wanted to do too and if they all turn out to be true then we're true otherwise we're false that's very powerful for validating a bunch of stuff I need all of these to be strings or I'm not happy just you know we haven't met it yet but as a function is string so you just give us the argument all is string and it will just tell you true or false they're all strings those are not okay so before you even take another step okay you got a problem here we're done yeah yeah we're done exactly so very powerful another nice simple function that's darn useful is length length doesn't need any arguments it will just count it's actually quite clever so if you give it a string it will count the characters and it's it does it like a human which is very pleasing because a lot of programming languages do it like a computer and so if you pass it the string pass a that is actually six characters because it's p a s s e and then the accent because they're actually under the hood the accent is a separate character that gets rendered on top of the e but jq is no no that's five you can give it an emoji for a stack of pancakes which is what that emoji is and it will say that's a length of one you know what I read this I immediately stopped and said I gotta tell Bart that's not right I know that the more complex ones are more than one digit which are more than one length and and that's code point that's one of the weirdest things that was a lot of the earliest emojis are only one right like a smiley face is only one digit one length of one right but if you do a smiley face with a skin color that's not yellow it'll be more than one because the second code point is the skin color that's because we didn't used to know that people had different colors or the middle-aged white man who wrote this stuff didn't know that yeah anyway let's not go there if you give it an array it will count the elements in the array if you give it a dictionary it will count the pairs so if you give it a dictionary with two key value pairs the length is two not four two which is more sensible frankly so all of this everything I've been doing today has been leading off to the one thing I wanted to tell you about which is the function for searching for things and if you we in programming by stealth although we've been going for a hundred and fifty six point seven five episodes because we're three quarters away at least here more we have not done databases but an awful awful awful awful awful lot of programmers have done databases and you know that the keyword in the standard querying language is select that's how you query a database select something from something where something well the function in jq is called select which is very pleasing I liked when I saw the word select now select is a function and it needs an argument which is basically of the piece of logic you would like me to apply so select takes as an argument a filter okay and it does the simplest thing in the world if the filter returns true select returns the value unchanged if the filter returns false select silently swallows the information so the effect of piping through select is that anything that matches comes out the other side and anything that doesn't match disappears so if you start with your dot prizes array and you pipe that into select wherever the condition matches comes out the other side and continues on to your next filter where you might say dot name or something but anything that didn't meet the criteria is annihilated destroyed evaporated wherever you want to think about it it disappears so that would get interesting in the database for the Nobel prizes for the years during the world war that you would be running into some nulls and therefore those might be not there those are false right which means they'll be silently swallowed so that's okay right that's what I mean I mean it can be used as as a way to eliminate those yes yeah it absolutely code yes but that is that is all it does that's spectacularly simple if my filter returns true then I will pass the value when changed if my filter returns anything outside false then my filter will just silently absorb the value and so it is the most filter like filter right when you describe a filter you take some input and you make less of it appear the other side like select is the ultimate filter right I'm a little bit confused by you saying it passes the value so in your example if okay if I read that now mm-hmm absolutely jq single quote dot prizes square bracket so we're going to explode dot prizes he's piping it to select open roundy bracket dot year equals equals quote two thousand unquote because they use strings close single yeah so to me that says I want to I'm going to explode prizes and I'm going to select only the years that are equal to two thousand but it doesn't tell me what it's going to spit out is going to just spit out two thousand two thousand two thousand no what is that okay so this okay good thank you okay so the value is dot so all of dot so the entire input comes out if the condition is met so we're saying the condition is whether or not the year is two thousand but the thing that came in was the whole prize okay so the thing that comes out is either the whole price or nothing at all no it should send out the price where the year was two thousand okay so or nothing at all right so all the prizes come in and so for each individual prize we do a check is the year two thousand if it is not that price evaporates so it comes in and then it evaporates when the price when the year is two thousand it comes out so it's another json file structured exactly like dot prizes except all that's inside dot prizes will be the ones from year two thousand exactly only they're coming out as lots of separate one so they went in as one array we exploded the array into separate value so it will come out will be a dictionary another dictionary another dictionary okay okay so it won't be it won't be dot prizes when it gets on the other side because we exploded it exactly yeah exactly so there's no there's no going backwards in time right if you do something and then you pipe it somewhere else the somewhere else there's no idea where you came from I'm getting a little bit of a thing on what you mean by the value gets returned like what is what what is the value of dot prizes square bracket okay so dot prizes square bracket means explode dot prizes into one different in so everything in that array becomes a single thing another single thing another single thing so the select statement happens once for the first dictionary inside dot prizes once for the second dictionary inside dot prizes because that's the act of exploding right so the second filter happens once for everything in that prizes it's we exploded it right so it's looking for all the ones that are just that match this select dot year equals equals two thousand and the only ones it's going to pass through so it's going to be a list of all these dictionary items that are just from the year two thousand and you're calling that value that that's just the word sounds funny why this that is the right word because that's the whole point but it's a specific piece of data right so dot is the thing I'm working on now right so select will pass through the entire you give it a thing it will either give you back exactly that thing or nothing at all it doesn't alter the thing it's either yes or no but it's not really giving you the thing that came in it's giving you a subset of the same thing we have exploded it what I sent in was all prizes I so it's a no but you said one by one sets of diction oh okay okay so for each one that comes in it comes out the other side if it matches two thousand if it doesn't have two thousand it gets swallowed exactly because remember they're happening in parallel right so we have exploded it so the middle one happens in parallel yeah okay yeah exactly so it's very simple penny drive either pass it or we pass nothing right but it's very important penny right and so the argument is just the condition and if the condition is met the exact thing passes through one changed otherwise we'll lose it nothing comes out so when you run it you will see the prizes for two thousand which is kind of cool now we can break things down if right so let's say we want just the prizes for medicine well we just chain them together right the whole point of these remember I said the filters you just chain them together don't try to do two things at once so you could just chain them together so you can say dot prizes open square bracket pipe select dot year equals two thousand pipe select dot category equals medicine and that will tell you the two thousand price for medicine but of course we can use the and operator so it is equally valid to say dot prizes pipe select dot year double equals two thousand and dot category double equals medicine and that's all in one giant roundy bracket because that whole big thing is now the argument getting passed the select right dot year equals and dot category equals is now the thing being passed the select exactly now what if we just want to see the lorry it's and not all of that detail right because the select is passing the whole prize well just stick another pipe on the end dot lorry it's explode those out now you're just going to get the name of the winners are just going to get the objects represented the winners in the price for medicine for the year two thousand so you can use your select in the middle of a giant big string of pipes it's just it's a filter on that point and then you can continue to chain and chain and chain because that's that's what we're doing here is we're just chaining things together changing things together changing things together now the last place I want to go today is the really fun part so I told you that there was a third version of any and all which takes three arguments wait no two arguments yes two arguments right and so let us say that we now want to answer the actual question I asked at the start of the show I would like the Nobel prizes by someone with a surname Curie so based on what I've just said a naive first attempt would be we say dot prizes square bracket square bracket to explode the prizes we pipe that to dot lorry it's to say that for each prize just give me the lorry it's and then we put that to so correct yes sorry for your good correction so we now have a list of names sorry a list of dictionaries that contain surname first name and motivation and then we run those through a select dot surname double equals Curie and that is absolutely going to find just things with the surname Curie but when you run it well actually the first thing that happens is we get an error kind of iterate over null null now poop okay well I I realize now that I wrote the show not slightly out of order and I've already given you the answer to this question the reason is because we have that optional because sometimes there are no winners but now I'm going to show you how to find the years with no winners right because last time we worked around the problem I just throwing in the question mark but now I'm curious well when was there no Nobel prize and why so to do that we can use select and this time our criteria is we want to have the prizes where the length of the laureates is zero in other words where there are no laureates because in the years where there are prizes laureates is an array with a length greater than zero when there are no laureates that is null and the length of null is zero because there is nothing right though okay okay yeah so we say dot prizes explode it pipe that to select and then inside select we are actually using our brackets to say dot laureates pipe length double equals zero so we're getting the length of the laureates and then checking for zero let me read this again I'm running into brain freeze on the parentheses okay jq open single quote dot prizes square brackets so we're exploding prizes piping that to a select within the select we've got a filter dot laureates pipe to length closed rowdy brackets so that's going to give us a number so you've taken the laureates and compared the length and compared to zero well because we want the length of the array if we explode it the answer will the length will be how many keys are there in each laureate we don't want to explode them right we want the length of the unexploded thing so we're going to iterate over laureates which is just one thing find out its length and then if that is equal to equal zero it will pass something out it'll come out right so what will come out okay so dot prizes is what gets exploded to the select so the answer is it will show us the prize so what we will then see is the prizes that didn't have winners and what you will see when you run that command is that there is a new key in those dictionaries overall motivation and it will tell you why there were no no why there were no Nobel laureates in those years it's not just during World War II 1924 1923 1921 1919 1918 1970 1960 yeah 1914 1914 is the start of World War I so 1418 covers World War I and then quite kept going again straight away probably because they had the Spanish flu and lots of stuff I don't know what happened in 23 my history isn't good enough well it's still gone in 25 still gone in 20 then it's there for 26 and 27 but got again in 28 got again 31 I'm not sure what was going on in the world then 28 is the Wall Street crash that seems like a strange reason to cancel the Nobel prizes yeah right 31 32 33 yeah I'm not there's a lot more since I think that it was turned off but they all say the same thing yeah anyway so there you go so I didn't know why my queries were failing and so I decided to use JQ to answer the question why is this data structure not the shape I think it is let me query the data structure to tell me about itself so now we know okay great all of that was just so that we'd learn to put the question mark at the end because that's the answer is to just make it not give an error but anyway I thought it was worth explaining why sometimes that happens and it's a good example of how to use length which is also why I stuck it in the show notes so when we run our naive query we get output that tells us each laureate named Curie so we see Marie Curie we see Pierre Curie and we see Marie Curie and we can see that in recognition of our service to the advancement of chemistry in recognition of the extraordinary services they have rendered by their joint researchers but they're just the laureates why are they just the laureates well if you look at what is the current thing being processed as we go through our chain we start off and we explode the prizes and at that point in time we have the piece of information we really wanted but what did we do to it we exploded just its laureates key so the only thing that exists now going into that last select is the laureates how we found the right laureates but we've lost we've lost what came before because we exploded it all away okay right right so we have the right condition but we haven't retained the right piece of information how do we square this circle it's an extremely common problem you have a piece of data that contains an array of lots of pieces of data and you're interested in a condition inside the array this is where Annie and all are your friends because what you really want is a summing up of the values inside that array I would like the prize where any laureate is a curie but still be the prize not just the laureate but be the prize exactly so the any function is going to be the key here because the any function we can dive in it will do its work on all of those sub-values and come out with one answer for the prizes that also have this filter of curie yeah so in order to do that we need that third variant of Annie and there is a matching one for all because those two functions are symmetrical now I told you that if you give it no arguments it just treats the input as booleans and if you give it one argument it applies that argument to the inputs to make booleans and then it gives us our answer the third argument adds one more layer of indirection you tell it how to make the array make wait we're making an array we're making a list of inputs let me use the word list of inputs because we're not strictly speaking making array we are making some inputs we are specifying what we want to do to those inputs and then any and or all will get applied to all of those answers so what we're going to do is explode the laureates say we want the surname curie and any of those is fine that sounds exactly the same as what we did before but look at the difference there's only one pipe so you haven't told people what it says yet so let's start so we've exploded the prizes so the inputs to that select statement are the entire prize that select statement happens once for every prize in the data set but it's the entire prize is the current thing and notice there is nothing after that select statement which means that what's going to fall out the back of that select statement is the prize let me go ahead and read it for people so it's jq starts we explode the prizes to the select immediately so we're not going to explode the laureates first we're going to select open rounded brackets any open rounded brackets again now explode the laureates with our question marks we get rid of the null ones and then dot surname equals equals curie so we've said select any laureates that have this surname curie so that's the two arguments because it's separated by a semicolon exactly and so what's going to squirt out the other side is still going to be prizes not laureates still going to be prizes that's what's going to squirt out the other side so it actually answers our question exactly exactly exactly so this is why any with the two argument form is so powerful because it's let us dive into that laureates array without having to explode it we didn't explode it we just went in did our query and it's all still intact nothing was blown up which for the Nobel prizes given their dynamite relationship is kind of funny well we actually do explode it but we explode it just within looking at any we explode it exactly fine to contain filter it and then but then we take the real prizes exactly exactly because it's the full input to select is what select spits out all or nothing very powerful yeah it's a subtlety but darn important because this way what we learn is that there were two prizes with curie winners there were three laureates with a certain curie but two prizes the first one in the list because it's in reverse chronological orders from 1911 where Marie Curie won by herself not only the first woman to win a Nobel prize but she won the whole thing she was the only winner and it was a prize in chemistry but back in time in 1903 she actually she was the first woman in 1903 but she only got a quarter of the prize in 1903 because it was shared between herself and her husband and Henri Bacquerel and Bacquerel got half and the curies split the other half so and they got it in physics so not only did she win two prizes one of them entirely by herself but she got one in chemistry and one in physics you know I know it's annoying that we know so little about so many female scientists but there's a reason we know Madame Curie there's definitely a reason why we know her I mean it is a running joke name a female scientist and everyone on planet Earth says Marie Curie but she didn't get there by default she darn well earned that position physics and chemistry and she only lived to how old it was like in her 40s or something well yeah because she was working with radioactivity right like her tomb is still a darn dangerous place her notebook is behind God knows how much lead and sometime when in a few thousand years we might have a look at her notes and that might become as valuable as Da Vinci's notebooks but for now they're a health hazard and you couldn't even photograph them because the radiation would ruin your exposure that's that's kind of crazy there is kind of crazy right I'm going to finish this out today with something we have not done in our entire jq series because we haven't quite had enough meat but now I can give you a challenge we have a data set full of Nobel laureates I have some questions and I would like you to answer them with your jq skills not by brute force because you could just google this but don't just google this use your jq skills so I have three questions what prize did friend of the show Dr. Andrea Gez win I would like to know the year the category and the motivation how many laureates were there for each prize not I would like you to list the win are all of them okay not in the context of the first question no no so for every Nobel prize ever so year category how many year category how many 1901 chemistry I don't know if it's two let's pretend it's two 1902 physics three whatever I don't know the answer that's why I'm asking the question but year category how many people and then which prizes were won outright which prizes have exactly one winner I would like the year the category first name last name and why what were they given the price for justification you mean motivation I do mean motivation because I typed it wrong all the way through the show notes and corrected all of them except for one okay I will I fix that now yes so there are three perfectly good English questions and the great thing about jq is that we can answer those kind of questions out of structured data it is a query language I love this part I love this so much excellent I have also I can I can give people a slight look under the sheet here I have just planned I have storyboarded the rest of this series so we as much fun as this isn't and as powerful if we stopped now this would be very powerful to dealing with web services apis right the amount of apis that returned Jason is already huge so this lets us get a Jason for the current weather and figure out information about it get a Jason for someone's IP address and get information about it this is already spectacularly powerful but we get to go further so I have shown you about a twentieth of the functions that exist in jq so we have seen that we can do things like double equals great what about more complex searches like regular expressions oh yeah jq can do those so we need to go there next time and there's lots of other very useful functions so next time I'm going to ease off on the new concepts but I'm going to fill in lots and lots of those functions so why not you now know what a function is and what they do but there's loads of them built in that I'm going to tell you about so most of next time is going to be learning about all the other functions and they're cool and there's many of them and so that's going to be the next episode and the episode after that we're going to move on from pulling information out to transforming the information you could argue getting the length is already a transformation but you can do a lot more than just get the length right it's a full programming language so we can do math we can manipulate excel like we can take so instead of taking one cell and making another cell we can take one piece of jason and cack it a whole bunch of things and transform it into a different piece of jason can we give me a Nobel Prize can we change the deal that way we could absolutely yes that's quite an easy one find all the Nobel Prizes make the surname be Sheridan make the first name be Allison and spit them out the other side that's a filter perfectly valid filter we can do that I want that to be one of the challenges at some point you can have one of them oh good one random one not in like a real science though oh I want physics um in my background so that's going to be the second future episode and then our third and final future episode I'm going to really pull back the covers because not only can we manipulate data jq is a full programming language it has variables conditionals and loops really I sort of feel like it's already iterating I would have thought we didn't need loops we don't need them for some things we don't need them to iterate over the things we have been given but maybe we want to loop over something that we didn't get from the data maybe we need to do a loop based on something else explode something by 10 times or something well 10 copies of that array that we got from over here I don't know why I'm making this up as I go along here but there are times you want to do more than just iterate over what you've been given you may want to loop over something of your making you may want variables you want to capture things into variables simulating a variable based on the data and then spitting out some sort of fancy variable might allow you to do fun things like give me all prizes with above average number of winners that's actually quite complicated because you got to go through all the prizes to figure out the average and then go back and find all of the ones where you're greater than the average right right get the average remember the average and now go do it again anyway there's lots of stuff to learn here so that is going to run us up until well until the pen actually I will have some time off coming up so maybe maybe we'll go more than once every two weeks no yeah we'll see sorry I just discovered my my coffee cup is squeaky apology folks I'm also out of caffeine this is bad but anyway I didn't mean to squeak at you so anyway and there are three episodes left and that is what we will be learning in those three episodes and I am aware that I'm disorganized and the reason I'm disorganized is because I'm disinterested I am having as much fun as Allison is in this series which is great because when I started it I knew just enough JQ to be really frustrated and now I'm just in love it's such a powerful language you can tell Bart was excited about this because I think you sent me the notes like on Tuesday or Wednesday and I don't think that's ever happened Tuesday I think it's usually last last key typed right before you take off on your bike and then when you come back we record yes that is the usual thing on a Saturday frantically type hit commit go cycle go eat record with Allison no I was done on Tuesday yeah which means I've been working ahead anyway I'm gonna call it there because my dessert is in the oven at this point in time and we have had what I hope was a fun and interesting show so I had a blast good well until next time happy computing if you learn as much from Bart each week as I do I'd like you to go over to let's dash talk dot ie and press one of the buttons over there to help support him he does 98% of the work here I'm just the stooge that listens to him and ask the dumb questions if you go over to let's dash talk dot ie you can support him on Patreon you can donate via PayPal or you can use one of his referral links I really hope you'll go over and help him out in the meantime you can contact me at pod feet or check out all of the shows we do over there over at pod feet dot com stay subscribed