 Well, it's that time of the week again. It's time for chitchat across the pond This is episode number 782 for December 20th 2023 And I'm your host Allison Sheridan this week Our guest is Bart Bouchotte's with programming myself in small event 158 where we continue our journey with JQ and in an odd series of events What you might hear in the background is rain at my house It is it is dumping out there Wow Do you need some is it too dry? I think it rained in February Okay, maybe March Maybe the other river will be a river Yeah, yeah, yeah, it's kind of kind of different for us as you described it. You said we call that Wednesday Yeah, pretty much. I got quite wet today. But anyway, that's not the owner there So all right, where are we going today Bart? We are today is kind of a pause and gather our thoughtsy-ish sort of an episode So I said to you in the big picture that JQ has three big jobs in life It has pretty printing Jason, which was very easy and we did that in the first installment That didn't take us any time at all and then I said we can use it to pull information out of Jason files and so we started off by just pulling Sort of data by its address, you know, go into the array find the first element go in there grab me the value That matches the key name Waffles or whatever and so we were specifying a specific address within the data set to go and pull out something And then last time We met a whole bunch of concepts and we had to do a lot in one episode And I believe I told you that one of those concepts would be the most difficult thing in the entire series And then it's all downhill from here, which I hope you find reassuring But we had to learn about operators we actually we had to start by learning about how we express data as literals Which is the same rules we use for Jason. So basically you have Booleans you have numbers you have strings You have no as a special thing and then you have arrays and dictionaries as we call them and Then we went on to look at operators which do something with you know Something on the left something on the right and they do something to produce something else and then we looked at functions and How if they don't take any arguments you just use the name of the function And you don't have to bother with any brackets if it takes one argument You just wrap that one argument in brackets and then we learned that if you have multiple arguments You have to separate them with a semicolon and you joke that you're gonna get this wrong all the time and I said none I will I do And that is that is true while I was working on today's show notes I ended up making some very naughty words for a while and not understanding why my thing wasn't working And then I realized I had a comma instead of a semicolon and that makes a very big difference Because the seven you good. Oh, yeah, absolutely because the comma of course is the and also operator not The second argument operator. They are very different things second argument versus and also and Then we learned about the amazing select function for letting us filter our data basically search for things based on straightforward conditions Is it equal to this is it not equal to this is it less than this is a greater than this and that was where we stopped Which gets us a lot of comparisons, right? You get to find anything where something's equal to something else But quite obviously when I ask you to search for something in the general sense You want to be able to do more than just equal to not equal to less than or greater than it's clearly a need for more Searching ability and that's the driving force behind today's installment really is searching better and that thing gets us ready next time to Form data, which is the third big thing that jq can do so we can print it query it Transform it and so the transform is the final part of this journey. That's that's not for now. We're getting yourselves ready for that Okay, what piece are we gonna do today? So today is we're finishing off the querying so at some extent I'm slightly trying to mirror last time structure So we're gonna start by looking at some more things related to type because there's a couple of functions in jq to do with types which are useful to know and then we're going to move on to one new operator which is going to save our bacon later in the show and many many times in reality because the real world is full of very questionably formatted Jason that is something that I have discovered By pure accident our data set of Nobel laureates has some quirks and that is very in keeping with what I find in the real world With my work hat on there's a lot of quirky Jason out there and Then we're going to finish up by looking at some More powerful comparisons. So instead of just less than greater than or equal to we are going to look at some other way it's some other criteria we can give and We're going to remit our best friend in the world the regular expression Which is of course an amazing way of searching for things doesn't match this pattern Yeah, it's sort of smelled like we would have to end up there. Yeah, of course we did of course we did yes But of course I started by setting a challenge last time which was intended To because I knew last time was a heavy lift I had basically intended to make sure that we had three questions which touch on each of the core concepts that I had hoped to try get across and I didn't expect necessarily for these to be easy Because JQ is what was the phrase we like to use dense dense JQ is as dense as the terminal was and the terminals denseness can be quite tricky So so I said part of message before we started talking that I was really sad about the Challenges because I really like what I'm learning. I really like data I really felt like I understood what he taught last week or two weeks ago and I spent a Ton of time. I mean hours and hours probably four hours Trying to do the first Solution and I did not succeed and I did not get close to succeeding because of some fundamental gaps in what I understood And I was sad because I really thought this was going to be fun And I never even got to the second or third one never even got to start him because I couldn't get past the first one So we've talked a little bit. We've talked me off the ledge I guess before we got started but but I was bummed because I thought I was gonna love this and I Did love trying but towards the end. I was just flailing semicolons and commas and square brackets and I Was asking chat GPT for help Bart. I tried to read the JQ documentation, which is terrible by the way It was horrible not even close Helping I had to learn a lot of JQ before I could use the JQ documentation It was not the JQ documentation is not a method of learning. It is a method of looking up what you already know At best. Yeah, I can see that one of the things I've said on on the nocella cast is what I like about chat GPT is I can describe in human form what I'm trying to try to do and then it'll give me the right wording of what I'm supposed to do but what it answered with was nothing close to what you did So we're gonna pretend I didn't even listen to the entire episode last week when you try to tell me how to solve This first one. Well, that's I'm gonna start off with some very general tips actually So the first thing is when you're building this up Don't try to build everything at once. I didn't okay good. I didn't itty-bitty little bites Okay, good because I sort of look at it as you start off with too much information and you're trying to filter it through multiple funnels to get Less information at the other side and the less you know You're done when the less contains only what you want and nothing more So you start off with everything and you end up with only what you want and in between you may need to do some work And so if you write the quick if you write the jq by You know you do one piece and you run it and you see the output and what you should see is too much and Then you put a pipe and then you try solve the next piece of the problem and build it up that way We didn't actually talk about how to solve this but I'm gonna start right in The first question the first challenge bar gave us was what prize did friend of the nocella cast podcast Dr. Andrea Giz win this the year category and motivation. I can Do pieces of that But I can't do all the year category motivation all in one gelt felswup and and the part I was stuck on was When we when we pipe one thing into the next thing into the next thing We're kind of drilling down into the array laureates Once a laureates gets exploded and I'm down in there digging out the information about for example her motive the motivation I'm good. I can get that but I can't get back up out of the array in order to pull out the year and Category I don't know how to do that. I that's where I got stuck and the key is not Backing out because you can't the key is not going in so you don't have to Okay, but I gotta don't I gotta dip in I gotta go in to get the motivation you got a look in but don't go in So imagine you open the door and you've had a look and you've decided is this for me? No, it is not Throw that one away, but don't step in because once you step in but I do need something in there I gotta go in and get the motivation later Motivations down in the right, but you want that later after you've thrown away all the Nobel Prizes that are not Andreas Okay, so you find it Okay, yeah find the prize and then when you have the prize Then pull out a three pieces of information I asked for but get the prize How do I find the prize without looking in the door? Okay? Well, let's do that. Well, no, you look in don't step in Look in don't step in there's more the way my mental way of thinking about it So let's start at the start we have okay We have a JSON data file that contains one dictionary with one top-level key value pair named prizes And the value of prizes is an array. So that is One unwieldy thing. So the first thing we actually want to do is explode that So we say don't prizes open square back a closer bracket. So leaving I'm with you so far Bart Perfect. This is important because on the other side of the pipe. What do we have? The other side of the pipe is now going to happen many times once for each element of the prizes array so each time The next filter gets called it is working on an entire dictionary that represents a full prize So so prices zero prices one prices two prices three Right, it's gonna just one. Yes, but each time through it's seeing exactly one of those so each time through the current the current value being processed is a full dictionary that Contains a year a category a key value pair called lary. It's that is itself an array of more dictionaries and It might contain something else, but I don't remember off my head, but there that's no objects objects not dictionaries We call them dictionaries because we've been doing that since our JavaScript objects But they're oh, did you say dictionaries? I'm sorry. Okay. Sorry. Okay. I'm with you. Sorry objects dictionary same-set cat it Yes, you know, we've been I've been very careful not to use the word object because Jill will kill me Very consistent about that the whole way through Okay, so So we're going through a year Category and laureates Over and over and over again exactly so that is what we now have so after we passed that first pipe We're now doing the the second pipe in the chain once for every Dictionary that represents a full prize So we are now looking at the door of the full prize So it contains our year our category and our array of laureates and we want to throw away Every prize that doesn't have at least one laureate with a surname Gez Now our first attempt to do this would be to put another pipe and to say dot laureates open square brackets close square brackets and then we have then stepped into that array and now the current thing being processed is a Dictionary with a surname a first name and a motivation and an ID, but let's ignore the ID And we can very easily find the current thing with the surname of Gez But what's what falls out of that pipe is now a laureate Not a Nobel Prize because we step we exploded the Nobel Prize. We took the prize and we blew it up Right, that's what we gotta not do But how how do you how do we not do that not to yes? I did because the the entire reason that the three argument any function exists is to avoid Having to step through the door it lets us look through the door, but not step through the door so Inside our second pipe All right, so everything between the second sorry Inside a second filter so after the first pipe between the first pipe and the second pipe is our second filter It's a select open bracket a whole bunch of stuff close bracket Okay, it's select a function by any chance. Yes. Yes, it is. Okay. Did you tell us select was a function? Yes, I did you went through the functions, but you only talked about not and You did the booleans and stuff and then I said that select was a function that process the input Okay, and it takes one argument that is a filter and if that filter returns true Then select passes the value to the next step in the pipe And if that filter returns false select silently swallows the current value. So that is how select reduces our Number our number of things we're processing right so we start off with Once for every single prize we run it through select and we say if this criteria matches you go on to the next chain The next process the next thing in the chain if this condition fails you evaporate into the ether you are destroyed so obviously at the end of that Middle pipe there are far fewer Nobel Prizes left Because everyone that doesn't meet a criteria is evaporated into nothingness Okay, so that is the point of select so the question is how do we decide which Nobel Prizes stay and Which Nobel Prizes do not stay? Okay, so we need to write a condition that will be true when We have what we want Now we need to check our surname not against one thing but against many things and that's why we use the any function because the any function checks one condition against many input and If any of those are true any returns true Okay So if she had one twice for example Okay, that's why you want any no No, so what the thing we are searching is a Nobel Prize. It contains one year one category many laureates No, it's got lots of categories No a specific Nobel Prize a specific you have taken the array of Nobel Prizes and Broken them into multiple dictionaries and every time we loop through we are processing a single dictionary Which contains a year a category and an array of laureates Okay, I Think I see what you're saying, right? So we are processing and Nobel Prize Okay, so there there is a Nobel Prize for 2023 in chemistry. There is a Nobel Prize for 2023 in Physics Yes, because when you look at that array in the raw jason file, you will see it's open curly bracket some stuff close curly bracket comma Open curly bracket some stuff. So each of those right right is a dictionary. Okay, so The select is happening once For each dictionary and the question being asked is does this one dictionary? I'm looking at meet a criteria if The answer is yes, you continue if the answer is no you evaporate into the ether All right So we are and what you used was any at this point and why wouldn't you want all like what if she'd won twice? okay, but Annie will catch twice If I give you five playing cards and I say do any are any of these black if two of them are black is still black That's what any me right, but I would answer. Yes Which is true cards, which is true, right? Annie answers yes or no true or false. So right when the true or false is what tells the select to move Descend of the answer now to the next through the next pipe. Exactly. Okay, right? So we are interested if any of the laureates in the one prize We are looking at have the surname Gez Okay So the any function in its two argument form the first thing you tell it is what? Do you want me to explode so that I can look at many things and then a semicolon? What condition do I apply to each of the things? So we are saying Take the laureates array and for each one Look at the surname. Is it equal to Gez if any of the surnames are Gez the any function returns true Okay, which means the entire Nobel Prize has gotten a true on the select and The select then passes the prize not true It doesn't pass true it pass exactly that is the function select either prizes the original thing or nothing So if select if the condition works out to true the entire prize gets to pass Okay, and you said select has one argument and the one argument in this case is the any function Yes, and the any function has two arguments. One is where do you want me to look? Yeah, and then what do you want me to look for? Yep Okay So in English if any of the laureates have a surname Gez The entire prize gets to continue to the next stage So even though we did explode laureates But we'd explode it inside the any right so at the very top level Right at the top level. We are still in We're the select function is working on the whole prize Okay, you did not do an example that covered this by the way It's the last thing you say but you don't have an example that that I Do actually because I show it not working because we can't see what Marie Curie got her prize for and then we show Oh, yeah, and this is actually what Marie Curie got her prize for we explicitly do it because otherwise we can't tell why she got her prize First time we do it We just get told Marie Curie Marie Curie Pierre Curie and we don't know why they got their prizes and the second time We do what we see the two prizes come out You know, I looked at that exam I've read your show notes at least six times and I did I still didn't get that from it that this is This to me looked like we exploded it went drove down inside because what we returned was stuff that was inside Laureates, it wasn't up above laureates. I thought once we exploded it. We were inside it I'm not sure how I know when I'm not inside it. Well, it's Right. So at the top level, what is the select working on? What is to the led the select is the one thing in that second filter in the chain So it is receiving whatever the filter before it produced so prizes open square back a close bracket is sending a full Prize dictionary. So select is working on the prize Whatever happens inside select is inside select Whatever is going to fall out of the other end here is Based on what select received so select will either pass exactly what it got in or nothing So what I got in is the full prize. So either the full prize goes through or Nothing goes through Okay, yeah, all the words that you're saying you did I'm remembering pieces of that. I Also said it was very difficult to keep your head around it because I'm asking you to imagine What the current value is and that takes on getting used to Yeah, yeah, okay. So at this point we have successfully Exploded prizes. We've got we've using the select We've gone through every prize that looked for any laureates with surname gays But now you're going to pipe it through and you're going to do some more stuff, right? I think it gets easier after this it does get easier before we go further What can you run that without the remaining pipes? Just run that dot prizes pipe select close bracket close bracket just run that bit of it and then see what we have I can Come on. Can I select the right window? I believe you can whether you shall I don't know No, not with the way it's no, I've got to get some quotes and stuff There's one single quote of the second bracket and the name of the file So yes, I have run that And it came out with the prize The 2020 physics prize, but it's got Penrose and Genzel in there along with Andrea Right, so but that is the entire prize dictionary And so now we've were left with a much simpler problem. I just wanted to know what did I want to know? I wanted to know the year the category And the motivation was the little curveball I threw in so If we want to get the no the motivation was easy part Bart because I kept being inside there I knew what the motivation was it was the category and the year that was hard because that was up above Now that now that we're not inside there yet, right? We're not inside there yet. So now the category now the motivation is actually a curveball So if we want to get the year that's easy. We pipe dot year and that will just tell us the year The and also with a comma Bing bing bing and also the category great that gives us two out of three And we know we have an and also because I've told you I want the third thing So we know what the next thing is going to be comma But then the question becomes what comes after the comma? Well, what we need to do now is we need to actually go in not look in we need to go in to the laureates Get just the one for Andrea and then get just the motivation so we need to Use brackets to say everything. I'm about to do here Is one little piece that third thing is that third thing, right? So we wrap it all in brackets and then we start and we say dot laureates open square bracket close brackets So we are now actually exploding the laureates from the one prize that is Andrea's prize All right, because that's all that's left now Okay, so what's in here now is is penrose guinzel and and gets Exactly, and then we pipe that to select and select is now being handed a laureate It's no longer being handed a prize It's now being handed a laureate because we've exploded the laureates And then it's looking in with the simple one argument version of select, which means it's going to go through its input and just look for A simple property dot surname double equals gas, right? Because we don't have to go any we don't have to dig deep into this laureates anymore. We're in the laureates. They only have names Right and motivations. So there's nothing deep here to go in. So we just say give me the surname equals gas Ah, but what you've really done you've selected So you're looking So there were three laureates And only one of those three is going to make it through the gauntlet So the who are they in so who's who's first in the array? It's it's it's penrose guinzel and then gaz Okay, so the first time that that select happens It looks and it goes dot surname double equals gaz penrose false. He evaporates penrose is gone Second time through the array. It's whether the second name is evaporates poof third time through gaz Yes So now so that at that point what is what is the input to the next pipe? Is it that that third object? inside the The prize that was physics for 2020 Yes Because dot laureates has exploded into its separate dictionaries for each winner And so we've thrown two of those dictionaries away and so we are now left with the dictionary with first name andrea surname gaz And the motivation is what we really care about So when you say select dot surname double equals gaz With this one prize the 2020 physics you got one prize. It's one element in this giant array of prizes When you say select dot surname gaz, how does it know to take that whole object? I thought it would have just surname gaz at that point. Okay, no So that is the condition not what that is how you want to look not what you want to get back So what you get back is whatever was to the left of that pipe, right? So you're piping something into select. So what is to the left of that pipe? It is you haven't said select any You've said select surname Correct. I want surname gaz. No, all you should have is surname gaz. No That's not saying what you want. That's saying what the condition is Is it so it must be saying any Object or dictionary sorry any dictionary within laureates within this little piece of laureates That has gaz in it returned the entire dictionary Yes, what a what a It is weird, but that is that is the fundamental leap select returns the exact thing it was given all of it or nothing So we are exploding the laureates So the thing select is given is a dictionary with the keys surname first name motivation and id So the only thing that can come out of that select is that dictionary or nothing Whether or not it gets to come out Is what's determined by the condition inside the parentheses. It's only saying whether or not it passes It's not saying what it's not filtering. It's not extracting information. It's gating the information either you all get through Or nothing gets through Hmm All right, it's a condition not a select not a Not a cherry picking. It's it's the condition for letting it all through Yeah, that's that's that's odd It is odd and I think that's what's caught you up I think that we have now hit the nugget of what has tripped you up the entire way No, I was stuck long before this part. That's that's not where there's a whole other penny still floating But I I wasn't even this far in remember I I I couldn't get to this point to where I would be confused I was not yet confused on this It is the same penny. It may be a penny with multiple sides. So maybe it's not a penny. Maybe it's some sort of a die That's falling in something that's six sides, but it is They are very very connected. I guess it's sort of yeah, I guess it's sort of similar. Yeah Okay, so all right, we're done with question number one in uh, 28 minutes 29 minutes Bart asked me up front whether maybe this would be a two-parter. I think it will be Right, but it's important, right? So the reason we do the challenge is so that we can have these deep conversations because It was easy to write that but actually the fundamental questions are deep There is a lot going on here So I don't think it's time I'd time misspent. I think it's something very very well. No, no, no I mean, whether it was or not. I needed to have this conversation. I hope the next one's easier I believe it is actually so the next question is how many laureates were there for each prize? And I want the year the category and the number of winners So again, we're going to start with well, we need to get Each prize one by one So we're going to start our train of work by exploding out the dot prizes Dictionary into its little pieces. Sorry the dot prizes array into its pieces So arriving at the next step Well, we want the year where every prize has a year so we can just take it dot year comma and also We want the category great. That's easy dot category comma and also Now we want how many winners were there So we now need to actually count the length of the laureates array So the answer we now want as our third and also is a length And I've told you that the length function expects to be passed an array So we say dot laureates pipe length and we just have to wrap it in brackets So that that whole so that that whole thing becomes our third round also roundy brackets. So we're grouping those together So I have a couple of problems here Why is that laureates not exploded? We can do laureates dot laureates pipe length Okay, so the length function by definition needs to receive as its input an array It's not a piece of an array The whole array so if you explode the array you do not have an array you have Ah, maybe you need to define explode bart Okay, so that explode meant open it up and let me look at it. No open it up means Okay, so remember what I said was that each filter Has one or more inputs And produces one or more outputs and the number of outputs can be different to the number of inputs So when you explode when you have a filter that explodes an array the input is one array And the explode square bracket square bracket means I am giving you Many outputs So the next filter in the chain happens in parallel Not in series in parallel Once for everything in the array Now the length no, but it's not an array anymore. It's not an array You're telling me because I can't ask the length of it once it's exploded. So I it can't be an array anymore Which is why I'm saying don't explode it if you want the length of an array You do not want pieces of an array You want an array Okay So tell me so this now in saying this more quickly in one piece dot prizes double square bracket pipe it dot year and also dot category comma And then open roundy brackets dot laureates pipe length Those roundy brackets so that What do you expect dot laureates pipe length to return? The number of laureates all of Is all the number of laureates in what? Okay, so remember a year by year Is that what you mean? No, no go back. What is to the left of the pipe? There's two pipes Okay, the main top level pipe We have dot prizes explode pipe So everything from that pipe until the end of that string Is working on Is happening once for every element inside the prizes array So there is a dictionary with a year a category and laureates At the first element in the array and there's one in the second element of the array and there's one of the third element So that and everything from dot year to the end happens once for every single prize So the first time to the loop the year is going to be 2003 23 The category is whatever's first in the list And then the array of laureates is okay. So how many laureates are there in in in the first prize? Three Okay, then laureates would be three. Yeah, it's that array of laureates that's being checked And then it does it again for the next prize and again for the next prize And again for the next prize and again for the next prize Okay, so it's showing it's showing I'm looking at the bottom of the answer, but it says Oh 1901 literature one 1901 piece two 1901 physics one 1901 medicine one So that's one person won the medicine prize in 1901. Yeah, okay Right because it's happening once for every prize So this explanation we're having here is not in the show notes. So I'm I'm wondering what we're gonna I will be asking this because I won't be able to This is a rephrase of last week show notes the the text of this is last time show notes. We're saying in a different way. So Maybe I miss understanding you Well, I'm just saying when I look at these answers I'm not going to know why if when I go back to these show notes, but Because you say it's in the other show notes, but I'm telling you Bart. I read it and read it right I believe you But I didn't get it at all So, uh, I understand it right at this moment If I know to look at this moment in time in the audio, I need to understand it again So, uh, I don't know how to say it differently to how I said it last time is the problem I'm having I wonder if I could put Notes to sections or something in there to say look back at this piece, but maybe that's my problem All right. What's what's question number three? Okay, so the the the last question then is well So we know because we've just done it that sometimes there's three winners and sometimes there's two winners and sometimes there's one winner So I wonder which prizes were won by one person who was good enough to win at all by themselves So which prizes were won by just one person who got to win, you know, which prizes were given outright And what we would like to know is the year the category the first name the last name and the motivation So you won it outright. Okay. So who were you? When did you get it? What did you get it for? So as always we start by exploding out our prizes. So The next piece of the chain is going to happen once for every single prize And what we're interested in to decide whether or not the prize Continues to the next step in the process is do you have exactly one winner? Right, that is our that is our criteria for continuing We start off with all the prizes and we only want the ones that have exactly one winner So we have a select. What is the condition we want? Well, we want the laureates piped to length to be exactly one But you put a question mark on laureates suggesting don't look at the no ones But the no ones would pass as not having a length of one They would have to be specific Because otherwise you would get an error because the length function will receive an input that it doesn't like It will receive null and it doesn't like calculating the length of null Well, but null is it doesn't meet the criteria of length double equal one So it falls and it would just it would just skip right over it No, based on the documentation the length function insists you either give it genuinely nothing Or you give it an array if you give it null null is not an array Null is a value the length function will throw an error if you give it null So why not? um Oh, was it only laureates that are missing but aren't there years there's no prize or did the prize go somewhere just not to any laureates The prize goes The prize exists as a dictionary, but it contains no key value pair with the key laureates Okay, so putting the question mark on the prizes wouldn't do you any good It would not because the prize the prize is a dictionary And it contains a year and a category And then it doesn't contain laureates. Okay. Got you got you. Okay. So that's why you have to put the question mark after laureates I do i'm gonna remember a question I want but I don't want to interrupt this Any more than I already have so we've got dot prizes piped we want to select dot laureates with a question mark pipe equals pipe to length double equals one So at this point laureates laureates pipe length I thought it's just The length function works on the current input. So how do we give it current input? Okay, we pipe something to it Okay So after that select function What has happened is we started off with all the Nobel prizes and now all that is left are the prizes With a laureates array of length one So we now have the full dictionaries for each of our answers But I wanted you to get a bit more specific and hand me the year the category The surname sorry the first name the surname and the motivation So we know we want the year so dot year comma and also dot category comma and also Now I want the first name But laureates is an array But I now know for a fact because I've proven it That it is an array of length one So how do I get the first name? From the first laureate I say laureates open square bracket zero close square bracket dot first name Oh because the next time through it's a different laureates. It's a different laureate And it will also be in position zero. Yeah, exactly What would happen if you just said laureate that first name instead of laureate square bracket zero close square bracket that first name No, because laureates is an array Not a dictionary So it just otherwise you have to loop through it or something Again, right, but we're looping through a thing. No need to because we know it has one. Okay. Yeah We just want the first name from the first element in the array Okay And we want the surname from the first element in the array and also we want the motivation From the first element in the array That is our answer That one actually makes more sense than the other two one of the things that bothers me about all of this is that It appears that you have to understand Everything about your data set before you start doing any of this. So when I started this I said I don't know what the data set looks like. I don't know where What I know it's prizes because Bart's been saying it over and over again in his example But I don't know what the structure is below that and I I didn't I knew I could open up the file and maybe pretty printed or something But I wanted to know how do I find out? What is the structure of the data I have and I I had trouble finding a way to do that I invented one. I just said jq Dot prizes zero Nobel prizes dot j son to show me what the first one is Yeah, how do you know like That way, how do you know that you have to put a question mark in you only know because it's screwed up because you got to know Correct. Yeah, and that is the only way you know that is when your data this behaves Either you scroll through all the data Yeah, probably not printing it or you deal with problems when they arise and usually so We have we have a few times in the show notes Like last time we had in the last episode we had an interesting example Where we first discovered and I say we I while writing the show notes first discovered there are not always laureates because I got an error So I then wrote a jq query to show me all the prizes Where the the the laureates is null And that listed out all of those prizes and that query was in the show notes as an example of how we find out Which prizes don't have laureates and that then showed us the structure pretty printed And I was like, okay, so that's what my data looks like and then I adjust it I feel like you go in blind to start with though I still be around in the dark. Yeah, and then Pretty pretty, but I mean it seems like there ought to be a way to tell me what does it look like But I didn't want to I didn't want to look at this thousands of line long json file The whole point of this is if I mean if it's short enough that I can scroll through it Then I don't need to do any of this nonsense. I can go. Oh, there's Andrea Right picking out element zero is your answer Okay, that is your answer and the other thing is at every point in the pipe you can just stop Let it print out what it currently has and then go deeper Right every time that you're meeting a pipe just stop there run it and see what's coming out at that point in the chain And then the next time you put a pipe, you're going to filter it down further and filter it down further So print as you go Right again back to the problem is I didn't know I couldn't get into the first part of it I was stuck at the beginning. Okay. All right. We're ready to start new stuff part We're definitely going to have to split this that's a long scroll bar coming up Okay, um How are we where are we now it is now we're 42 minutes in so 42 minutes in we should Let's go like 15 minutes and see where we are Okay, so we we spent a lot of we spent a lot of time last time talking about data types Right, so jq is a language for querying json. So we discussed the fact that it is inherited very cleverly Jason's approach to types. So there are Strings numbers booleans arrays what it calls objects what we call dictionaries And the special value null And one of the things you can do to try to explore these things is there is a function Named type Which will tell you the type as a string. So if you pipe something to type It will give you back one of the following six strings null boolean number string array or object And so you can use that to probe What something is Or you could use it inside your queries say select Whatever and the condition could be type double equals Number to say basically unless this thing is a number don't pass it on to the next element in my chain So select can be thought of as an if Yes If you find this Then pass it along if that works for you Yes Okay, okay. The other way talker is a filter So it either goes through or it doesn't go through but either way. Yeah, that's perfectly fine We call it an if if you like. Yeah a conditional pass Yeah, whatever whatever works for your brain So that is how do I tell what it is the type function will tell you what's type great. That's useful But we also know that all of our equality and stuff is done through strict Typing so the string 4 2 And the number 42 Are not the same because one of them is a string and one of them is a number So the double equals operator will say no, you are not the same So it would actually be darn useful to be able to convert from one to the other And That is also true when we have operators like less than greater than so if you give less than a pair of numbers It will do a numeric check But if you give less than a pair of strings It will do An alphabetic check and in alphabet land 23 Is greater than 2001 Because alphabetically it comes after 2001 because 23 is Greater than 20 So having strings having numbers as strings and then doing a comparison will cause logic to cease to exist And we have this in our data set because our years are all strings Our entire jason file of noble laureates is full of string numbers for the years Now if you're always dealing in four digit numbers the fact that it's alphabetic doesn't make any difference But if you try to say what are all the Nobel prizes after the year four, well, that's going to do very odd things Because it actually thinks none of them are after the year four, even though they're all really a long way after the year four so We need to be able to convert these things and thankfully jq provides us two very sensibly named functions two number and two string And they will take the input And they will convert it as they describe or they will throw an error if you give them something they don't like So if we say 2001 pipe two string double equals The string two zero zero one we will get it true Because the number has now been converted to a string and so now those paths are strict type check So let me remind myself What you've written says jq minus n and then you've done this two string thing The minus n means it don't expect an input. There's nothing coming in from the left-hand side Yes, there is there is no input here. We're all for some reason It's I think it's short from minus minus null input Okay, but more than that is easier. Yeah Okay, so jq minus n and then you've got inside single quotes because we single quote all this stuff uh open parentheses 2001 pipe two string Close parentheses. So that's become one thing now. That's the input double equals quote 2001 that would be true because we turned 2001 into Double quote 2001, which is a string. So it's true. Okay. Yeah, so we can say we've turned 2001 into two zero zero one And we've compared it to two zero zero one and they are indeed the same if we take The number 2001 and we double equals it to Open our bracket the string two zero zero one pipe two number and then close that off We also get true because now we're comparing a number to a number and that is yeah, those are indeed the same Uh, if we take the string two zero zero one and we pipe it to number and we say Are you greater than the string two three piped to number? We now get the much more sane answer that 2001 is indeed greater than 23 Because we've converted them both to numbers Before we did the comparison and now we get the expected true Okay And if you try to convert something that is not a numeric string to number Jq will get very cranky with you. So if you take the string waffles and you pipe it to number you get invalid numeric literal error Because waffles is not a numeric literal. Yeah, it does make sense The other thing we get out of the box is a bunch of I call them select like functions They will take the input and they will just check its type They won't check to see whether it's equal to something else. They'll just check its type and if the type is Within is what it's supposed to be It gets the pass through otherwise it vanishes into ether and these are does exactly what it says on the tin functions So they're named for what they filter So to test these we're going to use a new data file called sample data dot jason That contains a massive array of lots of random things It is an array that contains null true minus 1 0 11 3.1415 the string 4 2 the string waffles an empty array the array dogs comma cats An empty dictionary and the dictionary apples colon 12 pairs colon 3 So that is a nice Sampling of all of our different data types that jason just swept some data up off the floor Shook out the vacuum cleaner of data got it. Yeah. So the question becomes How do we get only the things that match a certain type easily right? We could put it through select so we could say select open bracket dot pipe type double equals string That would be a big select and we could do it right because we know that the type function will convert Our type to a string and then we can do a double equals But we don't have to we just get these free functions that do that job as a shortcut And the first of them is nulls Whatever you pipe to nulls If it is null it goes through if it is not null it vanishes into the ether So if we say jq explode our input so dot open square bracket crows bracket pipe nulls and we tell it sample data dot jason it will take that array And once for everything in the array it will run it through the nulls function and all that will pass is things that are really null So the entire array becomes One output null Everything else has been filtered away Very boring. What would it if Well, okay, so the way you typed it in here. It says hash null So it's not actually going to say hash. It's just that you did that because that's a comment So it's gonna say no, what if the second element in that array was null. Would it have said null comma null? No, because it literally to Well, no because you notice there's not null comma zero comma blank blank blank blank blank blank It literally Like the select function it is literally gone The stuff that is that doesn't mean the criteria is gone No, but i'm saying what if you had to if you had to null In your array Oh, it would give me answer then it would give me two outputs null followed by another null Okay, that's what I asked. Yeah, okay Yeah Now so nulls nulls is kind of boring. It's like, okay, give me all the nulls I guess you could count them, right? You could you could say pipe it to nulls and then count them or so I don't know it's very hard thing to want to do but hypothetically, right What's much more interesting is the opposite of nulls is the values Which is everything that isn't null is a value So the values filter will throw away the nulls but let everything else pass through So when we take exactly the same structure we say explode our input array Which is our big glop of every data type bar could think of and we pipe that to values What comes out is true minus one zero eleven three point one four one five forty two waffles empty array The array dogs and cats the empty dictionary the dictionary apples 12 pairs three The only thing that didn't come through Was null because null is not a value Well null came through is oh, oh, okay, I see So you don't this value as what values is one of the select like function. Oh, it is the second one. Okay, there it is Okay The next one values just means everything. That's not null got it. Yeah, okay The next one is booleans. It only lets through true and false. So in our data set that means only true comes out The other one is numbers Numbers lets through minus one zero eleven three point one five wouldn't four one five It does not let through the string for two Because that is a string not a number. Well, it makes sense you say But it's important to point that out because I'm not sure that's obvious What doesn't mean you know, I mean it doesn't mean I'm not going to look at quote 42 and see the number 42 Right exactly. So it's worth pointing out There is also a function named strings Which unsurprisingly will give us four two and waffles from our big glob of data Another very useful one is arrays. It will give us all of the arrays In other words the empty array and the array dogs cats And it does return the empty array The other one we have is objects which gives us all the dictionaries It gives us the empty dictionary and the dictionary apples 12 pairs three And then another interesting one is iterables It gives us everything that you could loop over in other words arrays and dictionaries So basically iterables are things that have more than one thing in them And the opposite of an iterable is a scalar a scalar is a single valued thing So the scalars are null true minus one zero eleven three point one four one five The string for two and the string waffles I don't like that redefining the word scalar No Yeah, it's a programmer's definition. I guess it's quite common within programming languages But it is a programmer's definition of the word, you know, I can see why the physics the engineer It's supposed to mean it has magnitude not direction It doesn't have anything to do with It's like It's also a one-dimensional vector Which in this case it is Well one-dimensional meaning not having direction a dot It's the opposite of a vector Yeah, exactly. It has one as a vector has direction and magnitude. Yeah, so what um What would happen if you put in waffles not in quotes? It would give an error because that is not valid jason Okay, gotcha. All right Um Okay, so the next thing I said we would have on our menu today is one new operator that exists And this operator exists for the sole reason that everyone's jason is dirty dirty data is the norm And it's not always bad dirty data. Sometimes it's Efficiency But the alternate operator lets you say do this and if that doesn't work because it doesn't exist do this instead Which is Very common to want so very common examples are There are data structures. Um, this is very common in active directory Where you could have one phone number. Everybody might not know what active directory is part Right, okay. I only know because I worked in a corporate microsoft world It's a it's a directory of the users on the system and there are there There is a thing equivalent to that inside your mac Called eldap. But anyway, it is a bunch of it's a it's a giant big dictionary describing every person in an organization and if a person has one phone number then eldap or Active directory will give you back a string when you query it If you have two phone numbers, it will give you back an array of strings It doesn't give you back an array of one string if you have one phone number It says oh, it says too much effort. I'm going to give you one phone number So that means that you're constantly and continuously dealing with the possibility of this could be a string Or it could be an array And so you always have these two possibilities flip-flopping inside your head So that's a very common approach that you'll see in lots of places a multi-valued attribute comes as a string or an array of strings The other one we have is You may simply have Keys that are optional Some some users have a smart card Some users don't that entire key could be optional So it may not exist in our data set A Nobel Prize where there were no winners has no laureates array So the laureates array does exist sometimes and doesn't exist other times But when the laureates array doesn't exist, there's actually a different thing that exists. It's the string overall motivation So that's actually really good candidate for the alternative operator Because either you have laureates or you have an overall motivation. So that's a particularly nice example So the way the operator works so the operator. Ah, you're gonna love this. You're gonna absolutely love this The operator is slash slash With your brain because you've grown up in javascript world. Thanks means comment It means alternative It's going to drive you absolutely mad because you're going to look at people's code on github and you're going to think Oh, it's just a comment. They've commented it out Nope. Nope. Nope. Nope. Nope. Nope. It means alternative. Well, we know json doesn't have comments though We do know that Annoyingly I know Very annoyingly So the way it works is that is an operator. So it has something on its left and something on its right If the thing so when you have the slash slash the very first thing that happens is whatever is to the left gets done And that will produce either null Sorry, that will produce a value right whatever said the left is going to have a value if that value is null or false Then the null or false are evaporated into non-existence and whatever is to the right Is evaluated and that answer is the output So what that means in effect is try what's on the left if that gives you anything that's not false or null Then the thing on the right never happens If the thing on the left gives you false or null do the thing on the right So it's basically it is kind of a cheap person's if else Right, so I think on the left or the thing on the right What's still bothering me is uh How do you know that this is happening inside your data? The only way you know is by it not working Right and then you can query your data and just look at the data. So explode the dot prizes and Look at it or scroll and scroll and scroll and scroll and scroll and scroll or put it through a select and put in the criteria So you're saying well, I think sometimes that we don't have a laureates array So then you select dot laureates double equals null And then all that will come out of that pipe is all of the noble prizes where there is a null laureates And then you can see the data. So you can use jq to explore your data You should describe what your uh, your code says there because I haven't even gotten to my code yet So you asked me the question. I was answering your question rather than trying to get onto my example. Okay So if you don't right, yeah, so you're running to an error So then you can use jq to figure out why you're running to an error by using the select to look for things that are No, no, whatever, right. So you actually can use jq to answer the questions about the data Which I'm not sure is the answer you were hoping for I think you have no because because you have to know what you're looking for in order to query for it I don't know why it failed Well, it's going to tell you null it's going to tell you null in the error. So then you're going, okay What is it? So this failed because it's saying that the query for null Double equals null. Yeah, that is a condition that will work. Okay. Yeah, okay um So my example of the alternate operator is to I so for every prize Either there are laureates or there's a reason there are no laureates Which in terms of our actual data structure is either there is a dot overall motivation Which is the reason there are no laureates or there's an array of laureates And so what I would like to print out is the number of winners or the reason there are none It seems like a valid thing to do either show me how many the winners there were Or why there were no winners And so to do that we take our dot prizes and we explode it out and then we pipe it into The alternate operator to the left. We have dot overall motivation So if there isn't overall motivation, whatever, so the right will never happen So on the years where there are no prizes, it will just tell us why Most years do have a prize So most of the time the thing on the left is going to evaluate to null Which means the thing on the right is going to happen instead, which is Inside brackets dot laureates pipe length So give me the number of laureates and so the output is going to be One two three There was no prize because of world war one Four or five one there was no prize because of world war two whatever the motivations are the overall motivations okay The this is interesting and the reason I'm pausing is I'm listening to you is it's almost like this cause this This uh, what is this thing called again? The alternate operator is almost like it undoes this concept of I'm going to take everything and send it through if it meets my criteria Otherwise, I'm going to vaporize it in this case. It's going no no no no no don't vaporize it Give me something different If you don't meet that that filter Yeah, because sometimes you want to fall back to something else and sometimes you want to to evaporate So if you wanted to evaporate you just say dot laureates question mark And then all of the ones that don't have a laureates would evaporate But what if we actually do want to do something when there are no when there are no laureates? All right, and that's what the alternate is for right if the answer is I want to do this or this Well, your alternate operator gives you a shorthand for do this or this Very powerful Your example makes me want to write more queries though and that and that's a good thing because uh, What his so he said prizes explode it pipe it to dot overall motivation Alternatively give me dot laureates slash pipe to length And so what we see is three one one one three three three So these are all the ones where there was a note there were laureates So it says for contributions to our understanding the evolution of the universe and Earth's place in the cosmos. Well Okay, but like who won that what was it the next one talks about laser physics It's like well, wait a minute you had something that it was a contribution for but we don't know who it's for I need to know more it makes me want to run the queries more We're good because that's where the end also operator comes in right because you can use that then to Break out things more and give you your desired more information um so yeah Now this is a point where we can continue or not your choice as the host Well, it looks like uh, how much more do you think we have here? We are in an hour I think I I'd like to follow it here this okay. You want to do one more? If we're not if we're not going to do both so basically we're now jumping into more advanced searching So we're either going to do our searching which is actually just two ways of searching So it's whether or not they contain something and whether and regular expressions Well based on the fact that I've had a little internet glitch here I'm nervous enough. It's probably the fact that it's sprinkled in Los Angeles that I lost my My internet connection for a minute there That I think we should probably cut it You know, we did have a rule before that that the challenges solutions were so long that we would make them separate episodes But we've been barreling along so quickly and having easy examples. I think we we didn't do that So I think we should tease this out right here and hold this one for next time Well, this is the perfect place to do that then so we shall do that. Uh, sorry listeners I have no idea when you'll be hearing the rest of this because well, yeah, because it's it's the silly season So goodness only knows when we'll make this happen. But uh, well the show notes are written so I can record at any time Yeah, exactly exactly Right. Well, um until next time actually and anyone who didn't succeed in the challenges I guess maybe now's a good time to go back and do them again. Don't cheat. Don't look at the answer Try to redo them, um Because it is good practice to get into the habit of understanding quite what is going on when you Explode things and pipe things from one filter to another because that's the key to it all So anyway until next time whenever it is happy computing If you learn as much from barn each week as I do I'd like you to go over to let's dash talk dot ie And press one of the buttons over there to help support him He does 98 of the work here I'm just the stooge that listens to him and asks the dumb questions If you go over to let's dash talk dot ie you can support him on patreon You can donate via paypal or you can use one of his referral links I really hope you'll go over and help him out In the meantime, you can contact me at pod feed or check out all of the shows we do over there over at pod feed dot com Thanks for listening and stay subscribed