 Well, it's that time of the week again. It's time for chitchat across the pond. This is episode number 784 for January 20th, 2024, and I'm your host Allison Sheridan. This week our guest is Bart Bouchotte's back with another installment of Programming by Stealth. Long time no see, Bart. It has indeed been a while. I was gonna say has it been a year, but we did record earlier in January, didn't we? Did we? Oh boy, I don't even remember. Maybe we didn't. Feels like last year. Wait, before we dig in, I want to tell the audience about something fantastic. One thing I have always, always, always wanted is a way to flip forward and back through the show notes in pbs.barterfisher.net from lesson to lesson, because a lot of times I'm looking for something and I don't remember which in a series where it was, was it in 101 or 102 or 103, and so I end up going into 102, going with back in the browser, and then clicking on the next one, going forward, then turning around, going back, and Mark from Australia, also known as Dfiant in GitHub, submitted a script ages ago to add this functionality. Just recently, Helma from the Netherlands has implemented it. So now, when you go to a specific installment down all the way at the bottom, you'll find a link to go forward or back to the installments by name. It's not just a forward-back arrow, you see the name of the installment and a button to go back to the index directly. So this is fantastic. This will be a huge time-saver. Even if it's only for me, I thank Dfiant and Helma from the bottom of my heart. Absolutely. One of the nicest things about the way we do this is that the community get to contribute and these are two fantastic community contributions. So thank you, thank you, thank you. Absolutely. I don't know how far back it was, but it was a long time ago. Oh, February 18th, 2022 is how long he's been waiting us for us to implement it. So luckily, Helma got on it because I didn't know how to do it. Yeah, I think Helma added the improvement that you could rerun the script without it adding more links. So basically, now we can just run it infinity times and it won't add extra links into the bottom of every episode. Okay. Well, and I think what he did was he wrote it for just the initial ones, like up to 101 or wherever he was and we didn't have a way to automatically have it do the next one. Yeah, basically, he had a script and Helma made it better and now we're, because two people work together, we have amazing. So thank you. All right, we've got a long one here today, so we should probably dig in. We should. So maybe to remind myself as much as anything else, just to set the scene. So we have been saying that JQ sort of solves three big picture problems, pretty printing Jason, which is very easy. And we did that in the very first installment, half of the very first installment, even that was easy. Then it was searching Jason, has kept us very entertained for quite a few installments because filtering a large dataset down to smaller pieces is a substantial, you know, there's a lot, there's a lot of power here. So there was a lot for us to learn. But searching allows you to narrow down a dataset you've been handed. But the third piece of the puzzle is to transform the data that you want into the form that you want it. So making it look good is useful. Being able to find only the pieces you want is useful. But being able to convert those pieces into the shape you need them to be is very, very, very useful. And that's kind of the final piece that we're now getting stuck into. And it will take us a few installments to get our way through. And mentally sort of thinking about the different pieces to learn, it sort of falls into two groups. You can build your own data structures from scratch, or you can basically apply functions to the data, like, you know, multiply by five, you know, that kind of a thing. And we're actually going to do it slightly backwards, which is we're going to build new data structures first, because that's actually kind of needed before we can get a lot of value out of the ability to transform data by applying functions to it. So I had written all of these show notes the other way around, and then I rewrote them all while spending two hours on the Tarmac in Brussels Airport. And I think it works better this way around. And I think, Alison, you helped me find a few places where I had, it sounded like I was saying something for the first time, way down the show notes. It's because when I first wrote it, that was way up the top of the show notes. Okay, yeah, you said something about, well, we learned this last time, and I was like, boy, why didn't you tell us the first time you said it that we learned it last time? And that's because it had been rearranged. I got it. That makes sense. It flows very, very well. I have done a pre-read, which I think will help the audience to enjoy it more when I'm not asking silly questions. I think I know where you're going. Cool. So before we go, we should go slightly back, because I set a challenge at the end of the previous instalments. So I guess the first question is, how'd you get on? My dog ate my homework. Your dog ate your homework? Oh, no. I thought it would have been your grandchild, but okay. Yes, yes, that was it. It was Teddy. Teddy chewed up my homework. That's what it was. Well, okay, by the way, our challenge was to find all the laureates awarded their prize for something to do with quantum physics. And what we wanted for these laureates was their first name, their surname, why they were awarded the prize. And really, our condition is simply, does the motivation contain any word that starts with quantum in any, you know, whether it's uppercase or lowercase, we don't care about, which is my way of saying, remember those regular expression things, let's go play with those. When you say starts with, is that so you could get quantum mechanics if it was one word, but it isn't? Correct, quantum chromodynamics. Just in case. Yeah, physicists tend to prefix quantum to things or tend to stick things after quantum. We're not sure quite which way to put that. And it also gives us an excuse to talk a little bit more about regular expression. So we had done some stuff looking for, I think the example in the previous installment was to look for Nobel laureates whose name begins with a vowel, which was slightly arbitrary. So now we're not searching the name. Now we're searching the reason they were given their prize. But the basic structure is going to be rather similar. So we're going to want to explode the prizes. And then inside those exploded prizes want to explode the laureates. And then we need to do select something. We need to do some sort of logic to implement our contains a word starting with quantum. So the question then is what is our sum test? Well, so we learned last time that the test function applies a regular expression to its input. So you send a string into the test function with a pipe. And then you give it the first argument is a regular expression as a string. And if you need to specify some flags like maybe I for case insensitive, then you put that as an optional second argument. And I've just noticed I use this comma instead of a semi colon to separate my arguments in my example, because I do that all the time, just like I promised you I would. So just fix that. So the question is what is our regular expression? Well, I mentioned as a hint that pro compatible regular expressions PCRE that there is actually an escape sequence for word boundary, which is start or end of a word. So a word boundary is basically it could be the start of the string. It could be after a space. It could be after a dash, I think. Basically, it's an intelligent version of this is the start of a word sort of comports with what those humans think. And also this is the end of a word. So if it's before a period, it will be a word boundary as well. And it's backslash B for boundary because backslash w was taken. So word boundary. And the other thing I kept on saying last time was that the regular expression is a string. So if we want to have a slash B in the regular expression, the slash needs to be escaped to be in the string. So it's actually slash slash B. So what we're actually testing for is backslash backslash B quantum semi colon. And then we want one flag, which is I for case insensitive. And that will then give us our 23 prizes, the oldest of which belongs to a certain Werner Heisenberg, you might have heard of. He has a wee bit of a principle named Outram. And the most recent is for quantum dots, which is making our modern televisions so nice to look at. Oh, that's pretty fun to know. I got to say that test looks kind of funny. Test open roundy bracket quote backslash backslash b quantum unquote slash. Yeah, it is. I like it. It is odd syntax isn't it? But yeah, the slash B means a word boundary. So yeah, we promised us dense. Yeah, that's regular expressions for you. They're dense. And JQ is dense. So double density here. Oh, like feel floppy disks. So I'm not sure I could write this myself, but it's very readable. It says JQ dot prizes square brackets. So it explodes the prizes, then you explore the laureates with a question mark to make sure you what, skip the empty ones. Is that what that does? Skip the ones that don't have laureates. Right. So remember, we had prizes where they didn't give them to anyone. Yeah. Yeah. And then you pipe that to the select, we're looking for dot motivation piping it to the test. Does it start with quantum and I want a case in sensitive? Yeah, that's readable. So there we go. There we go. Excellent. Right. Well, that is our homework done. So now we are going to build three things today. We're going to build strings, we're going to build arrays, and we're going to build dictionaries, because they are our three, you know, you can't build an integer, right? It's just there, right? You can't build a Boolean, it's just there, but you can build strings, you can build arrays, and you can build dictionaries. So that's what we're going to do. I want to start with the simplest one, which is strings. So I'm going to ask you to cast your mind back to JavaScript some time ago in this series before we got distracted by Git and bash, which were fun distractions, but you know, it's a while. And in JavaScript, we could make strings using something called a template string, where you would have a special character to say, I'm about to stick something into this string, we put the name of a variable. And then when the string was made, it would have whatever you had, then you'd meet that special character, the value of the variable that gets shoved into the string, and then you could continue on and then we call them template strings. I think of them as like concatenation in Excel. I want to take this thing from this cell, put say a slash between this and take this thing from that cell to another slash. And so you're, you're flipping back and forth between an equation to point to something and some text, a string that you're throwing in there. Yeah, exactly. So you're basically, you're shoving values of variables into placeholders. Right. Yes, that's probably the nicest way to say it. And the fancy pants computer science way of saying it is string interpolation. But that is what string interpolation is, it's sticking things into those placeholders. And that is what JQ allows us to do. So we know we can make a string by saying open a quotes, and then some letters and then closing the quote. If we want to stick a value into that string, we have to have an escape sequence, which tells JQ, this is a placeholder. And JQ's choice of placeholder is unique. I don't think I've ever seen a language you would quite the same way. It's backslash open roundy bracket, then you put in any JQ filter you like, that's going to make a value, and then you close the roundy bracket. And so it will take whatever is between those roundy brackets and do it. And the answer gets shoved into the string at that point, into that placeholder. So backslash two roundy brackets, and then in goes any valid JQ filter you like. Yeah, that is hard way of doing it. But yeah, it does work. And a lot of the time, the filter is really simple. It's dot name of variable. It's usually just the simplest of filters, which is basically go into that dictionary that we're currently in and grab me whatever. Almost always that's what you're doing. So as an example, let us play some more with our Nobel Prizes in NobelPrizes.json sitting in the installment zip. We can use our existing experience to find the prize for a friend of the Nacilocast Dr. Andrea Gez. And we're going to use string interpolation to print out her entry in a nicer way than just a dictionary. So what we're going to do is we're going to do our usual, you know, explode the prizes, explode the laureates, do a select, this case, dot surname double equals Gez is our select. And then we're going to pipe that into our new string interpolation thingy. So we're going to open a string, and immediately we're going to put in a placeholder. So backslash open round you bracket dot first name close round you bracket space, backslash open round you bracket dot surname close around you bracket space, was awarded her prize for space, backslash round you bracket dot motivation, close around your bracket and close our string. So we have three placeholders there first name surname and motivation. And so when you run that you get Andrea Gez was awarded her prize for the discovery of supermassive compact object at the center of our galaxy, which is what she did. Okay, one comment one question one thing I do like about this better than concatenation and excel is you always have to put the text part inside quotes in this case, the whole thing is in quotes. So you're fine. Yes. But how did for the discovery of a supermassive compact object at the center of our galaxy, how did that end up in quotes? Because if you look inside the raw JSON file, they're all motivations are all quoted. They are already wrapped in quotes in the raw data, which is very annoying. Well, in this case, it will become annoying later. Yeah, it's annoying. One case. It's annoying with numbers of dates or dates especially. So but okay, yeah, that's great. Yeah. So that is string interpolation in action. Okay. So actually in the show notes, I wanted to way more detail. I broke it right down, didn't I? Hang on. Sorry, I'm losing track of my own show notes here. Yeah. I don't know why I broke that down further in the show notes because that all made perfect sense on a set in English. Yeah. I've got to make sure that's not a leftover piece we're looking at breaking the example down for a few filters. Yeah. I think you just broke it down more clearly, but I think we're getting the hang of it. I think we're okay with that. Okay. So believe it or not, that's kind of all there is to it. Basically, backslash, open your roundy bracket, whatever it is you want to calculate, close your roundy bracket. That's it. That's string interpolation. So that's not too bad. Then we move on to building ourselves some arrays. So we've already seen. I do think we might have something misarranged because you start talking about the dictionary creating a producing a dictionary. Or was that still the rest of that explanation? No, that's still the explanation. That's showing that showing where we're putting the first name, surname and motivation from. Now, what you can see is that motivation has quote, backslash, quote for the discovery of backslash, quote, quote. So there's where your two quotes are coming from. There. Okay. This is perfect. I think you described it perfectly in audio, but now we have all of the context in the show notes, but we don't need to go through it again. Got it. Okay. Now I'm caught up. So now we've done our interpolating strings we get to build arrays. Yes. And I've already kind of been subtly showing you this. So in some of our examples, you'll have seen me make an array of simple values by saying square bracket, you know, 42 comma 11 close square bracket. And that's JSON syntax. And so you didn't call it out to me as a weird thing to see inside JQ because JQ is all about processing JSON. And you'd be right not to call it out because it is very sensible. But you can do more. The thing to go between the square brackets doesn't have to be a string and number a Boolean. It can be any JQ filter. And then all of the answers become individual elements in the array. So if the thing you stick inside the square brackets explodes something that has 50 elements, then your array will have 50 elements. If you explode something, run it through a select, only five of them make it through the select, then your array will have just those five that survive the select. Okay. Okay. So that is stupendously powerful. Because you have been asking me a lot, if I explode something, how do I put it back together? And I have told you, yeah, avoid exploding it. Well, the way you put it back together is you just put the square brackets before and after you explode it. And then that will reassemble it back in. And then when you pipe it, you're back to having one array. Okay. So if you need to explode something, just wrap it in square brackets. And then in the next pipe, it's an array again. You're back to it being one thing instead of however many pieces you made by exploding it. That's kind of very handy, but I think I know what you mean. Yeah, it's very powerful, though. And so we can use this to build any array we like. Spacey, anything can go in there. So what do I have as my example in the show now? So let's go back to our noble prices again. So we know that we can explode the top level array of prices, dot prices, open square bracket, close square bracket. And we know that we can filter those exploded values with our select function. So we can say every price after 2020 by just saying dot prices, open square bracket, close bracket, pipe it to select. Inside the select, we have dot year, pipe to two number, and then we greater than 2020. And that gives us no, you know, that gives us all of those prices. And just a reminder to number, we talked about last time. And that just converts a string to a number which is needed because our data set stores years as strings for reasons that make no sense. And if we didn't convert it from a number when we tried to compare it to 2020, it would compare it alphabetically as we learned in installment 157. And sometimes that's okay. And sometimes it's really not. And so it's a very bad idea to not make your numbers be numbers before doing math. So two number, and then we can with great confidence do a greater than 2020. So you put link in the show notes where you explain that two number in installment 157. And I'm glad you did the reason I noticed that one. Actually, that'll be two installments ago, because we're in 159. So 159. Yeah. So two number was one ago, and the explanation of the greater than sometimes being alphabetic and sometimes being numeric is two ago. Okay. Okay. I did this before I saw that you said you reminded us that you told us about it. I was like, what is that two number thing in searching pbs.barfister.net for two number? It doesn't find it. And yet it's in the show notes. It's there. How many times? You are 100% right, but it doesn't find it. So I don't know what searches get worse these days. That's interesting. Yeah. That's very interesting. Not good. Okay. But now your show notes will tell us where to go find the explanation. So that's good. Now, that is a simple enough query. But if we run it, what does JQ give us? It doesn't give us an array. It gives us lots and lots and lots of separate outputs. So if you look at that output, you have an open curly bracket to start a dictionary, some values close curly bracket, no comma. There was no leading open square bracket at the top. So you have a dictionary, new line, a dictionary, new line, a dictionary. You have lots and lots of separate dictionaries. So you don't have one dictionary and you don't have an array of dictionaries. You have a pile of dictionaries nonsensically attached to each other. Correct. Because actually from JQ's point of view, it gave you lots of answers. It didn't give you an answer. Okay. It gives you lots of answers. But what if you want a dictionary? Maybe you need to process this with something else, or maybe you need to send this to a whole other web service or something that needs true JSON. It can't handle these pieces of JSON. It wants an array. Well, the great thing is we could just take exactly the query we had before. We stick a square bracket at the very front and a square bracket at the very end. And now all of those answers are going to get combined into one array. And that one array is going to be the output of our function or of our call to JQ. So now if you run it again, you'll see that the very first thing in the output is an open square bracket. And you still see all of your dictionaries. But now they're all tabbed in because they're inside the array. And there's a nice comma at the end of every dictionary because they're all members of the array. And the very, very, very last thing is closing square bracket. So now your script is outputting one single array, not many, many, many, many, many disconnected dictionaries. Well, I was afraid you were going to tell me I had to do string interpolation and throw those commas in between. But just throwing the square brackets says, okay, let's just make an array of these dictionaries. Cool. Yeah. Exactly. So that's how you build arrays. You just put square brackets. I want this to be an array square brackets. Ta-da. You have an array. Now, something that I'm going to slip in here, because I wasn't really sure where this fit in the show. And it's because it both builds an array and it builds a string. So given that we've just built strings and we just built arrays, let's do it here. It is very common to need to go between strings and arrays. We've seen this in JavaScript where we literally have functions called split and join, where we say split, you give it a regular expression. And then it takes a string and makes an array. So if you say split on comma space, then it will take a string with commas and spaces. And for everything that isn't a comma or a space, you get an element of the array, element of the array, element of the array, and the commas and the spaces evaporate. They're just separators, right? They're gone. But you have an array. And join does the opposite. You tell it what you want to use to connect them together. And it takes an array and then at first element, that separator, second element, that separator again. And so it's injecting. Get a little string to join into one giant big string from your array. And the really nice thing is in JQ, the two functions are doing exactly the same thing are called split and join. Yay. So split requires a string as an input, which is completely sensible because it takes a string and splits it apart. So of course, you have to give it a string as an input. Its second argument is a string telling it what characters it should split on. So if you give it as an input, the string one comma two comma three, and you pipe that to split with the single argument, the string comma, you will get back an array one, two, three, because it will have split it on the comma. Hang on. Hold up. Hold up. Hold up. We started with a string and I don't remember you saying it was going to turn it into an array, but I guess it just does. That splits job, right? Split's job is to take a string. When you tell it what to split on, you get an array. One value becomes money. And join is the inverse. You give it an array and it smushes them together to make one big string and you tell it what to use as the glue, the one to a better description. So if we take the exact opposite, we give the actual JSON for an array into JQ and we say join it with a comma, then we get back the string one comma two comma three. So the join is going array to string and the split is going string to array. They're inverses of each other. It's very clean. It is very clean. The tiny amount of uncleanliness, but it's not massively uncleanly, if you give split one argument, it is going to say, okay, that's a string. I am going to just not treat this as a regular expression. It's a string I'm looking for exactly a comma. Some people are sloppy. Some people put comma space, comma space space, comma no space. You know, we humans do these kind of things. So you may often want to split on a regular expression. And to do that, you use two arguments. The second argument will be interpreted as flags for the regular expression. But even if you don't need any flags, you still have to give the second argument because otherwise JQ doesn't know you mean regular expression. So two arguments means this is a regular expression even if you have no need of a flag. Okay, I'm lost at how we're telling it we have a regular expression. By simply saying semicolon, second argument. So once you give a second argument, the first one becomes a regular expression. That's the rule. One argument, I'm a string, two arguments, I'm a regular expression. Okay. And can you explain the regular expression? You were it's a split open roundy quote comma, and then square bracket with a space in it and a question mark that means one or more spaces. No, it means zero or one spaces. Question mark is the zero or one operator in regular expressions. Okay. Arguably, plus would have been more powerful, which is one, sorry, star would have been more powerful, which is zero or more, which would have allowed for even sloppier humans. But we don't like them. Yeah, yeah, actually, yeah, I wish I'd done that with a star now. But anyway, it is a valid regular expression. That is the key point. The key point is second argument means I'm a regular expression, not a plain old string. That is, that is the takeaway, which is why I popped it in bold. And it's, and it's a, this second argument to repeat it one more time for clarity is simply quote, quote. In this case, it's quote, quote, because yeah, we don't need a flag. If we need it to be case insensitive, we could have put I in there. Quote, I quote. Yeah. Okay. But in this case, we're all, all we're doing that second argument for is to tell JQ, hey, that's a regular expression to your left. Okay. Precisely, precisely. It's a little inelegant, like I say, a little bit sloppy, but I will forget this just so you know. That's why I put it in bold. Because I will too, and then I'll be scrolling through the show notes and I always look for the bits I put in bold, because that's like a message to me. You will forget this. Yeah, or, you know, both of us will forget this. So, so far, we're not doing too bad. We have string interpolation to build strings, we build arrays, which is our square brackets, and now we figured out that we can go from arrays to strings with split and join in a very JavaScript-like way. And it's not just JavaScript, almost every language has split and join. So now let's move on to building our dictionaries. And again, JQ is a language for processing JSON. So of course, JQ is stealing most of its syntax from JSON. So if we want to make a dictionary, we open a curly bracket, we give a key name of our choosing, we put a colon, and then we put a value of our choosing comma and do it again as often as we like. Again, just like with arrays, that value can be the result of running a filter. So we can have a JQ statement as the value in our dictionary. So you may have seen me do it with just simple names and values. Well, it could be a name and a piece of JQ, and then it will get calculated, and that will be what goes into our dictionary. So let's, let's sort of do another worked example. Again, we're going back to our Nobel prizes dataset, because I'm very fond of that dataset, even if it is messy. So that dataset actually is because it's messy. That's actually, yeah, actually, that's a really good point. It's helpful that it's representative of that sort of thing. Reality, yeah, that happens a lot. So we, the data about Andrea Gez's Nobel prize is all in there, but it's in a shape I don't like. The shape of data that I think there should be about Dr. Gez's Nobel Prize is just very simple. I want a year, that's a number. I want something called prize that is physics in Andrea's case. I want the name, that's her full name, and I want the citation for why she got the prize without any of these sloppy extra quotation marks. So basically, I would like us to build the dictionary that I think it should have been in the first place. So let's work our way up to that. So the first thing to do is, if we already know that we can explode the prizes, pipe that to select. If we go any dot laureates, surname double equals Gez, we will end up with Andrea's dictionary without it being changed by us. So what we get back is the full dictionary for the entire prize that Dr. Gez was one of two winners, one of three winners on. Oh, Roger Penrose. How did I not notice that before? Wow. Okay. So we see that as a year 2020 physics and then the laureates array is in there with one for Roger Penrose, one for a Reinhard Genzel, cool name, and then one for Andrea. So of the things I want, year and prize are right there for the taking, right? So I can already see that if I say open curly bracket year colon dot year comma prize colon dot category, I've got two out of the four with very little work, right? So if we take that existing JQ to give us Andrea's full dictionary and then we pipe that to our new filter open curly bracket year colon dot year comma prize colon dot category, then we get this dictionary out now, which is year 2020 prize physics, which is two out of four. And no, it isn't. It's 1.5 out of four because 2020 is a string. But hey, we're making progress. Okay. So how do we fix the year? Well, that's an easy one. So instead of saying year colon dot year, we just say year colon is a two number ding ding ding. Okay, good. So we just pipe it to two number and then we go comma prize colon dot category. Great. Now we're halfway year 2020 prize physics. Now we need to get a little now we need to bring in our knowledge from string interpolation to build Andrea's name. But the problem is we can't just get her name immediately from the top level dictionary for the whole price because her name is inside her laureates entry inside the laureates array. So we got to go dig in deeper. Now we learned in previous installments that if we take the dot laureates and we explode it and then we piped up to select dot surname double equals guess, then we get the one dictionary that is actually Andrea's dictionary. And so I've put the full command in the show notes so that you can make the dictionary yourself. But you can see it's ID 990 first name Andrea surname, guess motivation with the city extra quote share for. So we can see straight away that the two pieces of info we want our first name and surname. So string interpolation tells us we can get those with quote backslash round you bracket dot first name close it space backslash round you bracket dot surname close it close the quote. We're now the sick that string interpolation as well as all of the explodey stuff. All of that goes into name. So for our previous example ended with dot category close the curly bracket. Now we're saying comma name colon make me know the key called name. And then all of that logic goes in there before we close our curly. Without laureates around it to name colon space dot laureates square brackets pipe, blah, blah, blah. There's not a quote around you know no roundy square anybody bracket around all that it just splats right in there. Splats right in there because it ends with either a comma for the next dictionary entry, or the squarely bracket for dictionary done. Okay, so until you meet either a comma for next key, or the squarely bracket for we're out of here, you can just keep adding him. So yeah, there it is. So name colon dot laureates open correct closer pipe select pipe, our string interpolation. So now we're on three out of four. We're doing pretty well here right we have our year, prize and name. So the final step is we want the motivation. And the logic is almost the same as for the name. But instead of doing the whole string interpolation, we can just say pipe it to dot motivation. Hey, that's not looking too bad. But now we run into the problem with those annoying extra quotation marks. And this gives me an excuse to give you a preview of what we're going to be doing in the next installment, which is learning about all the functions for manipulating our data. And one of the things you very often have to do because a lot of API's prefix or postfix answers with things like debug colon space, blah, blah, blah, or a timestamp, like all sorts of things get prefix and postfix the things. And in this case, quotation marks. And so if you want to remove something from the front, it's called left trimming, because you're trimming from the left of the string. And if you want to trim something from the right, it's called right trimming. And that tells you why these functions have really odd names. L trim stir, left trim string, or trim stir, right trim string. That's how you remember them. That's what they stand for. And when you give me the argument, I'm singing the same song over and over again. But in Excel, it's left and right. Oh, okay. And you tell it the number of characters you want. You don't tell it which characters, but you tell it how many of them. So I want the left three or the right four. So the nice thing with the way it works here in JQ is that you tell it what characters you want to remove. And it won't care if they're not present. Nice. So if there is a quote, take it off. And if there isn't, don't give me an error, don't get cranky, just carry on, which is nice. I like that. So we basically pipe our motivation to, there's a typo there, there is not dot. You pipe the motivation straight to the function L trim stir. And we have to give it the string quote, which means it's quote, backslash quote, quote, because you have to quote the bloody things inside our strings. And that takes care of half of our problem. Then we pipe it to or trim stir, and we do the same thing again. So we take away the quote from the right. And after all of that, our motivation is nice and clean. And so if we copy and paste that final key into our dictionary we're constructing, then we finally get our four out of four year prize name citation without any messy extra square brackets or extra round extra quotation marks. I do like predicting where I'm going to get stuck when I look back at L trim stir open roundy bracket quote, backslash quote, quote, I'm going to think we were removing the backslash. That's what it looks like to me, but backslash means to escape. So I'm escaping the fact that I'm looking for a quote, but the whole thing has to be a string. So it's inside of quotes. Yeah, I know. I hate, yeah. Prediction number seven of what I'm going to get wrong later. I hate having to backslash things that always breaks my head. So we have now done actually the vast majority of what I want to talk about today, but I do want to teach you one more cool thing for two reasons. A, because it involves regular expressions, which I love and B, because it is very much related to building dictionaries. So we built a dictionary explicitly. We said, I want a key called year and I want you to go fetch the value from here. I want a key called name. I want you to go build the value out of these two pieces. But another way that you very often end up with a dictionary is that you have a joint big string, which contains multiple pieces of information you care about. So the string could be a timestamp, in which case it contains a year, a month, a day, a number of hours, a number of minutes, a number of seconds, if it's ISO 8601, a number of milliseconds even, right? Or I mean, it could be any structured piece of data that contains multiple things. So you can write a regular expression that matches at date. But one of the things you can do with regular expressions is so-called capture groups, where you put roundy brackets inside your regular expression that basically says this little sub piece of the pattern, this is a capture group. I want you to remember it separately. And in the bad old days of all the regular expressions that we've come across so far in taming the terminal and here, they get numbered. They become the first capture group. Okay. I haven't figured out what a capture group is yet. It's a piece of a regular expression. So if you have, let's not do dates because then American and European gets messed up. Let's do time. Okay. So the pattern for a time is one or two digits followed by a colon followed by two digits. Okay. So you can write all of that as a regular expression. So I would write that as open square bracket, zero to nine, close square bracket, open a curly bracket, one comma two, close my curly bracket. So that means one or two digits, colon, open square bracket, one to nine, close square bracket, open curly two, close curly, exactly two digits. Right? So that's my full regular expression and that captures all of the time. Okay. Digits colon digits. The hour is a sub pattern within my pattern. And the minutes are a sub pattern within my pattern. They're called capture groups. So the name for a pattern within a pattern is a capture group. Okay. All right. In the bad old days, and everything I've ever taught you so far and anything we've ever done together, those capture groups are made by saying open around bracket, whatever you want, close around bracket. And we don't get to name them. The first round bracket is capture group one. The second round bracket is capture group two, whether we like it or not. And that's really brittle because if you change your pattern and you say, Oh, I need to capture a third thing. Well, if that is between your first two, all of your code is now wrong because what was two has become three. And when you're debugging your code, you're seeing one, two, they're meaningless magic numbers. A fantastic thing that was added to Pearl compatible regular expressions relatively recently is called named capture groups. So instead of them becoming one, two, you say at the point in time you create them, I shall name you blah. And then in your code, you can refer to them by name. In JQ, what happens is you take a string, you put it to the function called capture, you give it the regular expression with the capture groups, and it will make a dictionary for you. So we'll give you all of the answers. And the keys in the dictionary will be the names you chose for your capture groups. And the values will be those parts of the regular expression. Okay, that lets you pull data from a string. So you basically say, this string represents a time I want the hours and the minutes. And I want them as a dictionary with the key hours and the key minutes with the two relevant values. And you can do all of that with a single regular expression. Okay. I love it. So powerful. So let's look at an example. So to make a named capture group, you open your roundie bracket as you normally would to make any capture group in any regular expression, you know, in any context. And instead of just opening the roundie bracket, you say question mark, open angle bracket, your name close angle bracket. So the question mark angle brackets is like a label. You're basically saying, I dub thee, whatever. Then you carry on your regular expression. And when you're done with that sub piece of the pattern, close the roundie bracket, so we stop capturing. So that gives us one named capture group. And we would lather rinse for peace for as many capture groups as we would like in our dictionary. So if we want a regular expression to capture time, it's going to be open roundie bracket, question mark, inside angle brackets hours. So we're basically saying everything until the closing roundie bracket is going to be the pattern for an hour, which is zero to nine, one comma two, one or two zero to nines. We've then closed our capture group. So we are no longer capturing the hours, colon, because that is part of the big picture pattern. So part of the regular expression. Open another capture group, question mark, angle bracket minutes, closer angle bracket, and then the pattern zero to nine, two, I want two of those, close that capture group, colon, another capture group seconds, zero to nine, two. So all of that together is the regular expression with three named capture groups. Okay. That makes perfect sense, but I don't know what we use it for yet. Okay. So now let us imagine we have a string, nine, colon, zero, zero, colon, zero, zero. And it doesn't matter where that came from. We're going to pipe that into JQ. And we're going to say shove that into the capture function with that horrible, big, regular expression I read out to you. And what will come out of that JQ statement is a dictionary, hours, nine, minutes, zero, seconds, zero. And they're strings. They're strings. They're strings, yes. We could pipe those. We could absolutely run the string. To two number. To two number. We could absolutely do that, yes. But yes. So regular expressions work on strings. So what will come out of the string? Key value pairs, they have to be strings, don't they? Or no, they don't in a dictionary. They didn't have to be. Correct. But the regular expression makes strings because the regular expression is a string matching machine, right? Takes a string, finds the pieces within a string. So those pieces are strings. So yeah, you end up with a dictionary of strings. And if you need to do more, you can then process that dictionary. You could pipe that to another JQ command to convert it with two number or whatever you need to do. But basically the capture command is string plus regular expression to dictionary. Okay. So all this, we've just learned about the question mark angle bracket hours, close angle bracket to make it a named capture group. That's all regular expressions. That's something to do with JQ. Yes. We're just using it with JQ now. And we're using it with anger. Yes, exactly. So the capture groups is PCRE, Pro Compile with Regular Expression. The capture function is JQ. Oh. So the capture function takes as an input, a string, and there's one argument, a regular expression. Okay. Which all end up in quotes. Yes, because the capture group's one argument is a string that is a regular expression. Okay. Count your quote marks on this. It looks right, but wow. Well it is right because I ran it because otherwise it wouldn't have been right. The secret by the way to my show notes generally not being terrible is because I have a terminal open all the time and I'm constantly copying and pasting and laughing at myself for the amount of silly typos I make while trying to write JQ. And mind you, the same is true in Bash and everything else, but it's extra true in JQ I have to say. So at this stage, we've actually, we've learned a lot, but there's one more piece that I think is very much related here. And I think this is the perfect time to throw it in is that it is very normal with JQ. So you're starting off with a data set in JSON. It may have come by calling curl on some sort of web API, it may have come from a file, but you have some JSON, you're processing it, and you're sending it somewhere, right? The JQ is a terminal command, so it's that sort of Tim for important thing of do one thing and do it well, but it's going to be in a chain, right? Curl to fetch it from the web, JQ to process it and then send it to a file or send it to something else. And the something else may be quite picky about the format. So if there's something else is a CSV file where you need to write CSV. Now, you could absolutely do that with string interpolation. You could find all of the rules for CSV formatting and implement them manually. And that would mean you'd need to escape your quotation marks in very weird and wonderful ways. You could do it. You wouldn't enjoy it, but you could do it. And you could also similarly with a bit of jiggery pokery make it produce JSON format, or you could make it produce plain text. I'm going to clarify real quickly, just in case I don't think we've said it. CSV is comma separated values. It's a standard input format for spreadsheets. Indeed. And we, yes, we're going to get to CSV shortly. The other thing that it can do, which is a related format that's not so popular these days, but was once very popular is TSV. Do you remember TSV from your tab separated? I've never heard of TSV, but I know you can do tab separated. Yeah. Yeah. Excel will happily ingest both. Excel will give both of those as an option CSV and TSV. So another thing that you often end up doing with the output from JQ is building a URL with it. So we know that a URL, you can put data, question mark, and then you can start giving data to it. But that data needs to be in queries, query strings. Thank you. Yes. Yes. So at the end of the URL, you can stick on query strings and they have to be encoded where every special character becomes percent and then two hexadecimal digits. Again, you could write a whole bunch of JQ syntax. You'd have to use a substitute command to manually fix all of those characters. But that would be painful. Another thing you very often need to do is encode stuff in good old base 64 encoding. The amount of APIs that want base 64 encoded data is many. There are a lot of things that talk JSON and there are a lot of things that talk base 64 encoded. I've never even heard of that. The SNTP protocol, for example. If you ever need to send attachments from a script, you won't like it, but you'll get to know base 64 because that's how the email protocol works. Anyway, all of this is what I'm getting around to saying is there are many, many real world reasons where you want to take the output from JQ and either replace all the special characters using some sort of common scheme or take an entire piece of data, be it a dictionary or an array and format it in a well-known data format. And instead of you doing all the heavy lifting, JQ can do it all for you. You basically just tell it, I would like this and that format, please. So the syntax for doing that is the at operator followed by the name of a format that JQ knows about. So you do say at CSV and you will get CSV formatted data at URI and you will get that percent 20 carry on weird stuff. The other one that's really powerful is in HTML, you're supposed to say ampersand, some sort of silly abbreviation semicolon. So the HTML for an actual ampersand is ampersand amp semicolon. Let's give an easier one that people would have seen ampersand 120, I think it is, is a space. That's percent 120. That's percent 120. Yeah. So when you see a lot in HTML is ampersand QUT semicolon, which is a quotation mark. So the at symbol can handle this for us and I have a table below of everything it can do. And the way it works is either you just make it a whole filter. So you just, in your stream of pipes, you just say pipe and then you give it the name of a format and it will take all of its input and do whatever you say to it. And that's just stick it in the pipeline. That's cool. The other thing you can do is when you're doing string interpolation, you can say every substitution you make into this string, I want you to apply this escaping mechanism to it. And you do that by simply putting the outside in front of your string interpolation. So if you're building a URL, you would put the full URL and then backslash open roundy bracket, however you find the value close roundy bracket. If at the very, very start of the string you have at URI, say, then the URI escaping gets automatically implied to every single one of your placeholders. Okay. Hold on here. So the, I'm confused because you said pipe it to at CSV, but now you're saying at URI would be at the beginning. I'm saying there are two ways of doing it. One of the two ways is that you just apply it as a whole filter, which means it applies to the entire output, right? So if you put it in your chain, you're applying it to everything, right? It's just another part of your chain. But if you're building a string, you don't want the bit of the string that you're explicitly typing to get messed up. You only want it to apply to the inserts, right? You have a string with placeholders. I have two worked examples to explain both uses. Okay. So I may be over explaining this to the point where I'm confusing you instead of helping. I'm getting clues. You're getting clues. Okay. Good. So there are two completely different ways of using this one operator, I guess is the takeaway for now. And I'm going to demonstrate both of them to you. So the first thing we're going to do is we're going to do some CSV. Hold on. You have something in bold you didn't tell us. Yeah, I'm about to say it in about two sentences. Okay. In English, it works better this way. Okay. In text, it worked better the other way. So we have from a previous installment menu.json, which was an array of dictionaries for pancakes, waffles, and a few other things, of course, which contained a name of whatever it is on our menu, a price for whatever it was in our menu, and how many of them we had in stock in our imagined restaurant. So that was menu.json. And so it's just an array of dictionaries. Let us imagine that we have a need for having our menu in CSV format that is not unreasonable. And we can use the at CSV filter to take our array of inputs and turn them into what we need. Now, the at CSV filter does things one line at a time. So if we would like it to produce a whole file, we need to produce multiple outputs from a JQ command. The other thing to bear in mind is that JQ is, JSON is its native language. JQ likes to give you JSON unless you tell it otherwise. So if you take a valid CSV string and you turn that into JSON, what you get is broken CSV, because it gets wrapped with superfluous quotation marks, because in JSON strings are wrapped in quotation marks, but CSV, that's wrong. So you have to tell JQ to stop doing that. And as we learned many and so months ago, the minus minus raw minus output flag tells JQ not to do its JSON thing. Just give me the raw output. But we don't have to do all of that typing every time. We're going to say minus or. So if you're going to use the at CSV, you also should use minus or because it doesn't make sense to say give me CSV in JSON. Okay. Okay. So that's a little subtlety there. So if we want to take, if we want to make a line of CSV, we take any array we like, how we pipe it to at CSV, and we will get one line of CSV. So could we put CSV at the beginning before the array? No, because, okay. So in JQ, you have filter pipe filter. So the output of one filter is the input to the next. That makes sense. So at CSV needs as this input an array. Okay. And it will turn that array into fields on one line of CSV. Okay. That makes sense. So if we would like to output say a header column at the top of our CSV file, we would say open square bracket, name, price, comma, stock, close square bracket, which makes an array containing three elements, name, price, and stock. And we then send that with the pipe into the next filter in the chain, which is at CSV. So the input is an array, the output will be that array in valid CSV format. That makes sense. Okay. Now, our menu dot JSON contains an array of dictionaries. We need that array of dictionaries to become an array of arrays for use with that CSV. So if we say dot open square bracket, close square bracket, we are exploding the menu. Right. So the menu is an array, we're exploding it. So the next thing in the pipe happens once for every single line, every single element in our menu. What are we doing for every element in our menu? We're saying make me an array dot name, comma, dot price, comma, dot stock. And then we're taking that and we're sending it to at CSV. So that will happen three times once for pancakes, once for waffles, once for whatever else I put on the menu. I don't remember what else I put on the menu. Hot dogs. I put hot dogs on the menu for change. So that means that because we explode it, the middle thing happens three times and the third thing happens three times. So the end result is three lines of CSV, one for each explosion. Oh. Yeah, there's a lot going on here, isn't there? Yeah. So I was with you up to, we did our JQ to create the array name, price, stock, pipe it to CSV. So that gives us our header row. But then a simple comma will keep adding to this array. Okay. So it's going to just print one output, which is one line, right? So that, that. Okay. We are building an array here. We're just spitting stuff out on the screen right now. Right. Yeah. So this is just the filter, right? So I haven't wrapped it in the JQ command. This is just the JQ syntax. This is just the filter. No, I'm looking at the JQ syntax. The light, it says JQ. Did I jump ahead? Okay. You did slightly, but only slightly. Okay. Backing up. So backing up, we have two pieces of JQ here. The first one gives us a header row only. The second one gives us one row for every element in our menu. So together, those two things are everything we need. Header row, three data rows. So how do we do all of that together? Before you go on, I was still trying to figure out the name, price, stock, and studying that one when you explained the second one. It says dot square bracket, open, close square bracket. What, what's that? Okay. So our menu has at its top level an array. So dot is an array that is our menu. What is our menu? Oh, we aren't in the JQ yet. So we haven't told it what it is. We're going to stuff into it, but we're working with our menu.json file. So I guess I should have copied and pasted menu.json into the show notes again, so that you can see what we're working with. Dot, open, close, square bracket, do without... Okay. So dot means the thing we're processing. Sure. And open and close square bracket means explode it. But we haven't told it... We normally tell it what to explode, like dot, laureate, open, close, square bracket. We haven't told it what to explode. We're saying dot to explode the whole current thing. So... Okay. Sorry. I was trying to simplify this by not complicating it by showing you the file as well. Maybe I've made it more complicated. No, I think it's back to me not remembering what it means when you just say dot and you don't tell it... You don't tell it what part of the thing. I'm used to us exploding part of what the input file is. So we've had the input file of nobelprices.json, but we never explode nobelprices.json. We've been exploding dot laureates or dot prizes or... Correct. Because at the very, very top level of that file is a dictionary. Oh, okay. And in this case, this is an array. There we are. Okay. So we're literally exploding the whole thing. I suspect you told us that for installments ago, but it was gone. Okay. So we have dot, open, close, square brackets is exploding everything that's the input, and then we're going to pipe that to pulling name, price, and stock. Into a new array, which exists for... We're making an array that exists really briefly because at CSV needs an array. Okay. Right. Oh, we have a dictionary. This is a dictionary. Right. Now, we have to get the... I think we're getting a delay here, but we've got an array that has a series of dictionaries in it, name, price, stock for each one, but we need that to be an array in order to be the input to the at CSV filter. Exactly. Okay. And we have to get the order right because otherwise our data's a mess. So that's why we're explicitly saying open, square, bracket, dot, name, comma, dot, price, comma, dot, stock, that you can't go wrong, right? We're taking the dictionary, we're pulling the pieces out in the order, which is the same order as our header row, which is very good of us because otherwise we're talking rubbish. So we have two small pieces of JQ that do the two things we want. Print me a header, print me every data row. So if we combine all of that together into one JQ command, we say JQ space minus or, because we want the raw CSV, and then we have, as our one giant big filter, the entire thing I gave at the start, comma, the entire second thing. And I've had to wrap them in brackets because otherwise the pipes don't know where to go. Roundy brackets. So we have brackets, the first thing to print the header row, comma. I want you to be specific. Roundy brackets. Yes, exactly. We're grouping together, do this thing, comma, do this other thing. So the first thing is make our header row. And the second thing is make all of our data rows. Okay. And then we just use the good old fashioned terminal arrow, greater than sign, menu dot CSV. And now our JSON file has magically become a very pretty CSV file. You can open it in Excel, but you can also just print through the terminal. It's just name, comma, price, comma, stock, hot dogs, 5.9, 143, pancakes, 3.10, 43, Waffle, 7.514. It is valid CSV and it works in Excel, as it should. Interesting. Cool. So that is using our formatting as an entire filter. Right. We pipe it to at CSV. So the second thing I said was you can also use this to control what happens inside string interpolation. And the canonical example here is building a URL. Right. And you, I think you have some plugins and stuff on your Mac that let you build custom URLs that you tie to key thingies. Didn't you do a thing once we had special keys to search Google or to search something else? Oh, yeah. Keyword search does that. Yeah. So this same concept we can use in JQ. So we, well, I'm going to tell you, if you want to give someone a URL to a Google search, let me Google it for you. Then it's HTTPS colon slash, www.google.com forward slash search, question mark, Q equals, and then you put your search query. So if you want to search for pancakes, it would be Q equals pancakes. If you want to search for waffles, Q equals waffles. If you need to search for something which contains a space or a comma or frankly any special character, you have to encode that special character using the URI encoding scheme, which is percent 20 for space. I don't remember the other ones, but there is one for everything. So we're going to use JQ to build a search URL. What are we going to search for? We are going to search for the winner of the first ever Nobel Prize. So we're going to use our Nobel Prize data sets to find the thing we want to shove into the Google query. So we don't know who won the first Nobel Prize, we just know we want to Google them. Wait, so we're going to query nobelprices.json and make that be the input to a Google search without ever seeing what the thing we're searching for is. Precisely. In other words, we want to turn the answer to a question from our data set into a working Google link. Which is a realistic thing to want to do, right? You look something up on a web API, you get back some JSON, you pull out the piece you want, and you turn that into a Google link. In this case, the first ever Nobel Prize winner is my arbitrarily chosen thing to Google for. So this gives me an excuse to remind us all, myself included, that one of the cool things JQ does is it lets us read arrays backwards. Because this data set has the prizes in reverse chronological order, so the most recent prizes at the top of the file, and the first ever prize is at the bottom of the file. So if we say dot prizes minus one, we get the first ever Nobel Prize. I remember you telling us that. Yeah, it's so cool. I love languages that you do that. So in so if we say dot prizes open square bracket minus one close square bracket, dot laureates, zero, we get the first winner of the first Nobel Prize. If we then pipe that to string interpolation, backslash around your bracket dot first name, close around your bracket, space backslash dot surname, close around your bracket. We now have a way of looking up the name of the first ever Nobel Prize winner, which was Emil von Behring. And we have the JQ command here to do that. Okay. Now we want to turn that answer into a working URL. And so we can do that by having as a JQ query at URI quote, hdps colon slash slash www.google.com forward slash search question mark Q equals, then we start our string interpolation backslash roundy bracket. All the logic I just gave you close our roundy bracket, close our string. So just laughing at how many quotes there are at the end. Okay. So that is going to find and then you still feed it NobelPrizes.json at the end. We do exactly. Yes. So I was going to take Emil von Behring and shove that answer into the Q equals, but it's not going to just shove it in. It's going to apply the URI encoding before it shoves it in because we put at URI before we opened our string to two questions. Why doesn't Emil von Behring end up in quotes? Because there are no quotes in the data set. Your return of our standalone JQ did return it inside quotes. Okay. Those quotes were just wrapped by JQ. They're not. If we don't have JQ minus or they wouldn't have been there. So those quotes weren't there until the very, very, very, very, very end when they fell off the end of the command. Those quotes, I wish I'd put a minus or in there to not confuse you. Oh, so if we ran that same JQ command, not stuffing it into the URI query and all that, you're saying that that would have, it would have done it with a, it would not have had quotes. The old, those quotes were added at the very, very, very, very, very, very end by the JQ command. If I, I'm going to go back and do that. Just the JQ with a minus R. You're saying that I wouldn't get it in quotes. Correct. Okay. Okay. Got it. Okay. So second question. Why is at URA at the front, not at the end, like at CSV? I don't get it. Okay. Right. So in this case, we are saying explicitly, we are doing string interpolation and I want you to apply the at URI to the string interpolation. So the first time the at whatever was the entire filter pipe at whatever end of story, which means apply me to everything. Here we were saying at, well, it wasn't at the end. It was the, it was the entire filter, right? The pipe symbol says start a new filter. So the entire filter was at CSV. Okay. Here we are saying at CSV space, open quote. At URI. Sorry. You were, okay. At anything, right? At thing. Okay. So that means apply this encoding to the string interpolation. So when you just say at and you don't give it any more information, then it applies it to everything. If you say at followed by a string, it only applies it to the interpolation. At on its own, apply to everything. At followed by a string, apply to the interpolation. That's the rule. Not sinking in one tiny little bit. Okay. If the at, okay. Now I don't know how to say it. Could at CSV be at the beginning? Is there, is there an example where at CSV could be at the beginning or is it because it's at the beginning? So you got jq minus r single quote at URI and then all the stuff we're going to do, all our query strings and the URL and all that is after at URI. In the other example, you had all the query string stuff was to the left and then we piped it to at CSV. So at CSV was at the end. Once at the beginning, once at the end of the command. Okay. So in one case, we are saying take the input and run all of the input through the CSV. Yes. Command. Agreed. I get that one. So the entire filter is just at CSV and its input is whatever came before. It could be the entire file or in this case we're saying, give me this array and then send that to at CSV. So you could actually say jq minus or at CSV, name of file and it would take the entire file and run it through at CSV. Okay. So at CSV is a filter all by itself. We just have it at the end of a pipeline. Okay. Why do you put this at URI at the beginning? Why not pipe it to at URI at the end? Would that be the same thing? If we, no. Good. This is the perfect way to ask me the question. This is perfect way to ask the question because then I can tell you the difference. So if I take HTTP colon slash slash www.google.com and I pipe all of that through at URI, I get HTTP colon, sorry, I get HTTPS, percent something, percent something, percent something, www, percent something, Google, percent something, come, percent something, search, percent something. I don't want to apply it to everything. I only want to apply it to the bits I'm inserting. That's why it's at URI space, the string interpolation, only the answer, only the bit that comes after my slash open roundy bracket, the bit I calculate gets converted. The rest of the string is left alone. So the answer is HTTPS colon slash slash www.google.com forward slash search, question mark, Q equals. And now all the weird stuff happens. ML, percent 20, VON, percent 20 bearing. It's only applied to the string interpolation. Okay. I believe you. Think of it this way. If you just put the at and nothing else, it applies to all of its input. If you want to be specific, you use the at in front of string format. Okay. I think we should keep going. I don't completely follow it, but I believe you. Well, I guess think of this as a pattern. It's a pattern. So you can copy and paste the same logic into anything you're doing. Right. And it will behave in the same way. I guess what bothers me is at URI does it the thing that that is immediately following it is not the string interpolation. It's the HTTPS do www.google.com. You get to the string interpolation later. You have the slash open roundy bracket. That's the string interpolation, right? Okay. But the thing straight after it is a string that contains interpolation. Oh, wait a minute. Wait a minute. So that could be at URI quote Bob backslash open roundy bracket dot prizes minus one. Yeah, absolutely. Yeah. Okay. That's just text. It's literally just text to at URI at that point. Precisely. From a string interpolation perspective. Okay. Yeah. That sort of makes sense. I think that's as far as we're going to get me. Well, I mean, they're just patterns, right? So if you ever need to do this for real, you just replace the HTTPS with the bit you want and you replace the prizes minus one, whatever you want, and it'll work. It's just a pattern. It's a shape, right? And it behaves one way with one shape and one way with another shape. So the last thing I want to get to you today is just to tell you what you can do. So in terms of formatting an entire line of text, we have at text, which is just a shortcut for two string. So if you just need to force plain text out at text, and it's, it's as if you run it through two string at Jason gives you Jason the way an API expects it. So not pretty, not lots of new line characters with little tabs. It gives you the kind of Jason you get from a URL, which is not a single wasted character. It's just mushed together as one giant big barf of Jason, right? Great for computer is terrible for humans. But at Jason will give you the computer friendly version at CSV is our comma separated values. And at TSV is our tab separated values. In terms of encoding, which is escaping special characters is how I think of encoding. We have at HTML gives us things like an ampersand amp semi colon for and percent. Sorry, at URI gives us the percent 20 stuff at base 64 will do a base 64 encoding of the input, which this on that gets base 64 encoded to GGHPC why blah, blah, blah, blah, blah, blah. That is, you can base 64 encode anything and you get back this weird hexadecimal glop with equal signs in it. Very pretty. There is also base 64 D, which is decoding base 64. So if you take that glop and run it through base 64 D, you get back this and that. And the other very convenient one, because you often use JQ as part of a big terminal command is at SH will do shell escaping on a string. So you can safely send a string of JQ output as an argument to another terminal command with at SH. So that then brings us to an optional challenge. So I've already shown you that we can make a custom dictionary to represent Andreas prize. And that was very pretty. But I would like you to not just make a dictionary. I would like you to take as your input our noble prizes.json file and give me back what I think it should have been in the first place. So not a dictionary with a key called prizes that is an array with lots of dictionaries, which each contain another array of even more dictionaries for the laureates. I would like that entire file to come back to me as one top level dictionary, sorry, one top level array containing one dictionary for every prize that was actually awarded. So I don't care about the ones with no laureates. Don't want them in my list. And then for each element in that array, I just want four keys. I want the year as a number. I want which prize it was as a string. So chemistry, physics, whatever. I want the number of winners there were as a number called no winners. And then I would like a simple array of strings that is just the names of the winners called winners. So as an example of what a correctly formatted one would look like, the peace prize from 1907 should be year colon 1907 as a number prize colon piece as a string, num winners two. And then the winners array contains Ernesto Theodoro Moneta and Louis Reneau. That is that is it. So nice, simple representation of that prize. I'm going to give you a warning and a tip if that's me being kinder, kinder mean. So it's easy enough to do this for the prizes where the winners are human beings because they have a first name and a surname and you can just stick them together. But the prizes where the winner as an organization have no surname. And I promise you that your first attempt at solving this problem will result in trailing spaces. And you can check whether or not you have this problem by looking at the 1904 peace prize. If you have successfully accounted for the lack of surnames, then the winner will be an array with one string Institute of International Law with no trailing space. And if you haven't, you will have space null as probably your first problem. That's what I certainly got the first time I tried to solve this problem. Institute of International Law space null. Then I eventually got it to International Institute of Law space, which is still wrong. I promise you, you can get it without the trailing space. We may have trimmed some sort of trailing something at some point today, for example. Okay. So you can get full credit by allowing the space to appear and then trimming it away later. That is a valid solution. But there is a more elegant solution for bonus credit because it's mildly non-obvious, but you can stop it ever happening. You don't have to remove a trailing space you never create. And the key to never creating it is the fact that if you join an array of one element, then the joining symbol never appears. So if you have Bob and Dylan and you join it with a space, you get Bob space Dylan. But if you have the array Bob and you join it with a space, you just get Bob. No trailing space. So if you can arrange to have your name as an array, then you can never have the space be a problem. To help you make that true, I'm going to remind you of the existence of the alternate operator, which we both agree is terribly named because it's forward slash forward slash, which you, me and half the planet think means this is a comment. No. I'm about to escape a B, the boundary of a word. Yeah, I know, right. And then the other thing I'm going to tell you about is a function called empty. And what it does is it produces absolute nothingness, not null actual genuine nothingness. So if there is no surname, you don't want no, you want actually nothing. And the way to get actually nothing is empty. We're going to look at empty in the next installment because the documentation for empty is hilarious. It is genuinely hilarious. Just so you know, my dog already ate my bonus credit. Yeah, that's the bonus. The bonus credit, right? It's for bonus credit. We will, I am going to explain it in the sample solution because I want to tell you about empty. Okay, so you can read the documentation. Yeah, it's copied and pasted in the show notes. Yeah, it's cool. But honestly, if you can build that nice four key dictionary for every Nobel prize that was actually handed out 100% full marks and a really good example of what you want JQ for, right? You have a piece of data available to you. It's in the wrong shape. Beat it into the right shape and now put it as valid Jason. And then you can use it for something else. So this is the perfect example of why we want JQ. Cool. Right. That was fun, but there was a lot of it here. Yeah, we looked at it at a time, knew it was long, but it held together, I think. It did. So this is all of the construction work, right? We know how to build strings, you know how to build arrays, you know how to build dictionaries. And I've told you that there are functions for transforming. We've seen L trim, STR and OR trim, STR. There's loads of them. You can do all sorts of cool things to strings and to arrays and to dictionaries. The next time, we're just going to look at all the cool things we can do to strings, arrays, and dictionaries and how to do math. The other thing we'll learn next time. So compared to this, it's a way lighter lift because we're just going to learn how to make data change shape. Sounds fun. You love it. Yeah, you love it with your Excel head. So all of those functions that exist in Excel, there's equivalence of them in JQ because they're solving the same problem. We need to manipulate data. And so that's what we're going to do next time. Very good. Sounds like fun. I know this was a long, a long one and there's a lot in it. But like I said, it holds together. I think it's a good story and I'm glad it's in one lesson. In one set of show notes, we briefly debated whether we should split it, but it just, it held together. So I think it was good. We powered through, Bart, we made it. Yay. And of course, the most important thing. Until next time, happy computing. If you learn as much from Bart each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him. He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions. If you go over to let's-talk.ie, you can support him on Patreon, you can donate via PayPal, or you can use one of his referral links. I really hope you'll go over and help him out. In the meantime, you can contact me at Podfeet or check out all of the shows we do over there over at podfeet.com. Thanks for listening and stay subscribed.