 Well, it's that time of the week again. It's time for chitchat across the pond. This is episode number 788 for March 2nd 2024 And I'm your host to Allison Sheridan this week our guest is Bart Booth shots back with programming by stealth number 162 and we're still having fun with JQ right Bart. We absolutely are and he keeps getting slightly longer because I'm the reason I wanted to learn JQ is because I needed it for my work life And I'm using it a lot since coming back from my extended leave a lot a lot And I'm discovering that there are more things to learn. This this language keeps getting better keeps getting more powerful and There was a thing I thought would be like a paragraph and then I tried to write the paragraph and it became Washo so episode 163 has completely changed this new episode 163 is blipped into the middle of what I had already planned Um, but I I can see I can see where we're going and I'm having so much fun with this So I'm hoping you're enjoying it too as I keep stressing it. I am I notice I get rusty real quickly That I forget things I knew so I just like to register here a whole bunch of dumb questions I mean valuable questions to refresh the audience's memory as well, right? Right. Yeah, I mean that's literally your job, right? You're here to represent the audience So and you're very good at it as I keep telling you And so just to put us back into the big picture story So we we learned initially just a pretty print stuff and then we learned how to query stuff And then we learned how to filter stuff and what we've been doing in recent installments is change stuff So not just build a new shape of a dictionary, but actually Change things by doing math and by in the last time we did math We did assignment operators where we were changing the value of different keys in our dictionaries or different values in our arrays and then we had functions for messing with strings and Well, we're gonna pick that up. We're gonna pick that up today with the other two obvious types We have numbers we have strings. So what's the next most logical thing arrays and dictionaries? So we are going to learn about jq's various functions for Altering arrays and dictionaries. So not just filtering them or creating them But actually altering and give an array give an array and then do something to it and then have a new array and Dictionary as well And that will then set us up for next time where we're going to spend an entire episode talking about a special type of dictionary that it turns out is really powerful and really important and Without learning what we're going to do next time I would not have been able to make use of the output from the have I been pwned API which kindly talks Jason and Believe it or not when you work for an organization with thousands of users a lot of them get caught up in data breaches Shockingly shockingly when there's 20 million records leaked or is it 20,000 this week? Yeah, anyway, always big numbers So I had set you a challenge at the end of the last installment And what I noticed in the two weeks since we last recorded was an awful lot of amazingly cool activity on the git for xkpass wd Which is kind of the excuse for learning most of what we've learned in this series So I guess that's a sign that the series is working But I wonder did you have so much fun doing that that you may have forgotten to do your homework? Well, yeah, so when you reminded me of the homework yesterday Helma and I and Mike Price and Dorothy starting to poker head into it And I think I did Steve Matten do something in there, too I know he's been working on the show notes for programming by stealth. So he's been contributing as well But yeah, we've been having a real ball. I've been working the layout stuff while Helma does the heavy lifting of the JavaScript side of it and Then I went through and worked the accessibility pieces of it where it was pretty close But there were some things there were some contrast problems and those are really hard to see on your own But there's some tools you can employ. There's the thing called the wave accessibility tool that lets you analyze a website It'll show you everything that's wrong with it from an accessibility standpoint and it's it it highlighted The contrast errors that I can't see that looks fine to me, but I don't have trouble with low contrast. So Anyway, it's been it's been a lot of fun So I did I did take a whack at it this morning yesterday and this morning But I'm kind of glad I stopped because when I looked at your solution It was so elegant that I was going down a rat hole of things that were Not only incorrect, but really hard getting there trying to get there the wrong way So I've got a lot of questions about what you did. So I think I want to hear your solution Okay, well, I guess we should say what the challenge was before we look at the solution So what we had learned last time focused heavily on the assignment operators They were the biggest new thing we learned so I focused the challenge on those and On doing something which I like to do which is to correct bad Jason because there was so much bad Jason in this World I really do spend a lot of time correcting other people's Jason making it be the way I think it should be And so I asked you to make four changes to our Nobel laureates data set We've been using for most of this series on jq So the first thing to do is to make it easier to detect whether or not a prize was or wasn't issued So you can tell by two side effects whether or not a prize was issued It will have a field called overall motivation and it won't have an array of laureates And so either of those two side channels let you deduce that the prize wasn't given out But I said well no I actually want there to be a key in the dictionary that says awarded true or false Because that way it's much easier to handle if that is in there as an explicit key So the first one was add a boolean key named awarded The second thing I really don't like hang on So are you gonna talk about how you did it because I'm gonna say what we're doing first Or do you want to look at them piece by I was gonna get say the fourth thing I'd actually like to go to piece by piece because I understand the the overall flow and I think people will too, but All you wrote was dot square bracket. It's prizes Well, let me step back a sec Yeah, because we need to jump back to a very important point because The the biggest penny that needs to drop for this assignment to be doable at all Without tying yourself in very large knots is that what we need to do is we need to update the prizes So that means we're going to use an update assignment operator to manipulate the prizes So that means dot prizes pipe equals, which is the update assignment operator And then we're going to make a new value for prizes. So that means that from this point on Oh The dot and you for all prizes or something inside prizes Yes So no, okay The top-level dictionary has one key named prizes and that key is an array of dictionaries one dictionary for each prize We need to put a new value into the prizes Array, which means we have to explode the array Fix each prize and then Recollect our exploded array and put all of that back where we found it in dot prizes So right the first thing I do is I start a square bracket to say I am going to collect Everything I do back into an array So the very first thing is open square bracket and the very very very last thing at the bottom of my solution is the closing square Bracket so I'm collecting all the pieces back together and putting them into dot prizes Okay, so it's dot prizes pipe equals square bracket gobbledygook square bracket Bing Bing all the gobbledygook He's going to replace what's in dot prizes or add to what's in dot price replace It's a new value because it's an assignment operator not a what's the other one called what they say We are using the update assignment operator so that we have access to the current prizes which are now in dot So the current value of prizes are in dot So if it's update, why is it not why how is it replacing everything? I would think update would be additive no no update means You have access to the current value which you are going to be changing in some way So we're gonna change everything in dot prizes Zoom out a little so plus plus is an update operator It takes it is based on the current value, but it is a whole new value right if you increment 3 by 1 It's a whole new value for but your starting point is the old value. So it's an update assignment So our starting point is the original array and our finishing point is a whole new array Okay, all right, I'll buy it. It seems counterintuitive, but okay, so we've got dot prizes pipe equals We're gonna update assignment We're gonna create a whole new array that's gonna replace everything that we had right what makes it an update Assignment is that from everything is from that opening care from everything after that equal sign The dot operator is the current value of the prizes array So when I on the next line say right dot open square bracket close square bracket, I am exploding the currently existing prizes Okay, I'm with you so far right so now we're working on individual prizes because we've exploded them out So now we can do the first part of our challenge which is add an awarded key Okay, here's why I get stuck right away great good So okay, so so you've got you've exploded the array We've we've taken in dot prizes. We said we're gonna update it We've exploded dot which is now our new prizes then you say pipe dot awarded equals has laureates How does dot awarded equals create that key? Okay, it is so the equal symbol is a normal assignment operator So its job is to make keys. It's that to make or update a key So if there was a worded key, I would be replacing that value with a whole new value Or if it didn't exist, I'm making it. So I am literally creating so like you would You don't it's declaring and assigning all in one That's cheating that should say let not awarded equals like I was sitting there going well How does he already talking to dot awarded if it's not there yet? So just a worded equals says if you got one updated if you don't just make it right because effectively everything exists with the value of null Think of it that way right every possible variable Everything is that like philosophically in life everything exists with the value of no good probably very deep I'm sure this philosopher somewhere can tell us that's very deep and worth an entire thesis And maybe a PhD I'm gonna use that in my in my mastodon post to say think deeply about this because we did Okay, so now I'm with you. So we're adding this key. We already have everything that's in dot prizes We haven't lost that stuff. That's all that we haven't done any selects or any nonsense So we've added dot awarded equals has laureates and has laureates is is a Boolean apparently It is a Boolean and I don't know why I wrapped it in an extra set of brackets Oh, I know why because I did that a long way and then I made it a short way So that extra set of brackets is entirely superfluous. Okay extra brackets. Don't bother me. I'm okay with them Mildly bother me, but yeah, okay. I'm okay with you being bothered. Okay, so now we've got a key It looks at whether there's laureates if it has them It's gonna set the key of dot awarded to true if not, it's gonna say false Correct because if you read the docs on has it returns a Boolean Which makes sense because you would normally use it after an equals or something, right? You'd normally use it in a condition of some sort. So it spits out a Boolean So that's right one out of four The second thing I said was I I really don't like it when important keys may or may not be present So in my mind There should always be an array to hold the laureates if you don't have any laureates. It's an empty array It's not no array Because that just makes you always have to put in question marks and all these kind of silly things everywhere. Just make it an empty Array, it's two bytes of data an open square bracket and a closed square bracket. It's not gonna break the bank All right, so but part everything exists with the value of null Anyway, I'm gonna keep bringing this up. Okay, so the way you do that is you say dot laureates and they use the the That's a ternary operator. No, no That's the ternary they or the or one. No, that is one we learned about last time. That is the conditional Assignment which means if laureates does exist do nothing if laureates doesn't exist give it the value Whatever's on the right So that allows us to set laureates where they don't exist become the empty array and where they do exist They get left completely alone. If you take those two slashes out you delete all the laureates Okay You know when I I went back and reread and reread and reread what slash slash equals is supposed to mean and I don't Get that from the description in the in the show notes from last time that what you said makes perfect sense And I'm sure you said it last time, but it doesn't jump out at me. So that's why I didn't catch it Okay. All right. So now we've said if there's no laureates put it there if not just leave it alone Correct. So that's two out of four Okay, so the third thing then was to go to descend the third and fourth things are inside our laureates Because they need a bit of cleaning up too. So that's cleaned up the top level dictionaries for the prizes But the laureates also have a bit of a mess Now we can use the same pattern we used for the prizes for the laureates So we say dot laureates pipe equals Open a square bracket explode the laureates work on them and put them back together So we're now going to say that inside our prizes. We're now going to explode out the laureates With this update assignment so that we can put them back together and stick a new laureates array Back where we found it And again, we're exploding the laureates. So now we can work on each laureate one by one And the first thing I ask you to do is another boolean key Named organization to tell us whether or not this prize was awarded to an organization Because again, the only way we can tell at the moment Is by side effect, which is do they have a surname? If not, then they're an organization, which is very messy. It's a very only human readable code So we're simply going to say dot organization equals Has surname piped to not Yeah, what? That was my next question Pipe to not what if it has a surname. It's not an organization That's the rule if it has a surname. It's a human So to make it be the opposite we not us So what does it mean dot organization equals has surname pipe to not Right. So if I don't know what the laureate if the laureate has a surname Has a surname is true But that's the opposite of it being an organization because an organization has no surname So we need to invert the true to a false to get the right answer has surname pipe to not means doesn't have a surname No, it means if you have a surname the answer is false So dot organization gets assigned The value false if you have a surname So has surname will return true if there is a surname So are you sending the organization to the surname? No, I'm said okay. So has surname returns a boolean. So you now have boolean piped to not So the outcome is going to be another boolean And I said has surname piped to not means doesn't have a surname and you said no So I'm still lost No, it means the opposite The way you said it sounded like you were saying something different, but it has the if yeah Yes, if you don't have a surname then the value will be true Right, so if it doesn't have a surname has surname piped to not if it doesn't have a surname then organization equals False shoot Okay, I'll go along with you. I'll say it's true. I buy it organization equals true. What does that do for us? What do we just create some a key called that organization? Yes So the the the the thing was to make a new key named organization Which holds a boolean to indicate whether the lariat is an organization or not So we want organization to be a new key which has the value of true or false depending on whether the winner was Or wasn't an organization And the way we can tell is if they do or don't have a surname If they don't have a surname they are an organization, which is why we need that flip Yeah, yeah So okay And the last thing the last the fourth challenge was to make a new key named display name So that we would always just be able to get the name to print easily because if they're a human They have a first name and a surname and so we want to have first name space surname And if they're an organization they only have a first name So we only want to print their first name And so instead of having to put all of that logic every time we want to print a lariat Let's just make a new key named display name that has that value ready for us to use And the way I chose to make the value Was to make a new array which contained first name And then either surname or nothingness Depending on whether surname exists or not and then to join those with a space Okay Which is okay, it's just I find it I find it the easiest way to to make the space be there or not And then we now have the last one. Yeah, then we now have a nice name And so when you run that what you get is our very very familiar Nobel prizes But each one now has a new awarded field They will always have a lariat array even if it's empty And every lariat will have a new field named organization And every lariat will have a display name And if you were to save that to a file you could then call it clean Nobel prizes And then you could use that file as your input for everything else Gotcha, gotcha. Okay Should we just save it that way and start using that one? I'm actually quite tempted to although what I ended up doing in the in the examples later is Play with a whole new data set Because I needed different kinds of data, but we I'm not giving up on the Nobel prizes They have been a lot of fun. I haven't we're not quite done with them They will show up in future. I think the messiness made it more useful to you is if it had been a clean data set It wouldn't have been as helpful Right because you got to learn on realistic data and the world is full of bad jason This this I know and ironically my new data set also contains bad jason string numbers It's a database with stock prices and the prices are strings Stock prices, of course they are Getting real good at pipe to number, aren't you? Very good at pipe to number. Yeah everywhere all over the place Right. So some new stuff for our brains Let us start with arrays. So we've done numbers. We've done strings. So arrays seem like the next thing to do And the simplest kind of a transformation you would be likely to want to do to an array is to reorder it First thing that jumps to my mind is I wish the Nobel prizes were in the opposite order I'd like them new oldest to newest instead in newest to oldest And the function to do that is simply reverse The reverse function takes as an input an array and gives you as an output an array And what it does is it turns it back to front So if you send it the array one two three you get back the array three two one Just exactly what it says in the tin as the British would say I like that one. I follow it And now use dash nc Uh, your example jq dash nc. What's the the n is no Input yes, because we're making the array and piping it to the reverse function And the c is for compact output. So instead of printing the output on Five lines an opening bracket an opening square brace three comma two comma one comma close square brace It just prints them out in one line because I yeah, yeah, I just seem very wasteful of show notes green real estate To have our just those three numbers splurtered across five lines Um, the other thing then you're likely to want to do other than reversing is sorting And the function to sort is called sort It takes as an input an array and gives you as an output an array And to be honest it pretty much does what you expect it to do If you give it some numbers it'll sort them numerically if you give them some strings It'll sort them lexically or alphabetically And if you give them an array or a dictionary It will follow an algorithm that is in the official jq documentation and that is so complex I have decided to label it out of scope of this series The other thing which it does which is interesting is that if you give it all numbers It's obvious what it's going to do and if you give it all strings It's obvious what it's going to do. What happens when you give it a mixture Well, the first thing it will do is it will or it will sort them by type And then it will sort each type and it has a rule for the order of the types. So nulls come first Followed by booleans Followed by numbers Followed by strings Followed by arrays Followed by dictionaries So if we give it The array one two three one four three we get back one three four, which is just a numeric sort great If we give it popcorn waffles pancakes, we get back pancakes popcorn waffles, which is an alphabetic sort And if we give it 42 true 11 waffles false and pancakes We get false through 11 42 pancakes waffles. So the booleans then the numbers then the string I guess they had to pick an order right exactly and they wrote it down in the documentation And that's kind of the important thing right it it right it does what it says in the table It doesn't need to make sense in this case. It just needs to be written down. Okay, we're good And most of the time you sort things of the same type So most of the time the behavior you get is what you expect And if you're wondering, well, how do I do a reverse sort the answer is you pipe it to reverse when you're done You there's no argument that says go backwards. You just pipe it to reverse Sort of the terminal version of these things right everything does one thing and does it well And so if you want to reverse it well, we have a function for that That's not clutter up the sort function by having an option for going forward or backward Just show the true reverse now There are rules for sorting dictionaries But actually if you have a dictionary If you have an array of dictionaries There are probably some sort of a record It's probably like an array describing many items in a menu or something or an array describing many noble prizes It's going to be an array of related pieces of data So you're probably going to want them to sort of them by some specific value So you might want to sort the noble prizes by year So there is a whole separate function to allow you to do that. It's called sort underscore by And it takes one argument which is a filter And that filter can be as complicated as you like or it can be the name of a key So we could filter our menu in menu dot json by price by saying sort underscore by Open parenz dot price close parenz, and it will sort it by the price If you do that, you will see that pancakes are the cheapest item in my menu followed by hot dogs followed by waffles And that is Most of the time when you do a sort by you just say give me a key and I'll sort it that way But that could be any filter So you can sort by a calculated amount Which is where I turned to my new data set I have discovered that there is a free json api where you can get stock information You have to jump through a very small hoop of registering for a free api key So it's not zero effort. You just register for an api key They need your email address and they will spam you a little but that is it That is the sum total of the cost and you can filter the spam into your bin Um, it's from a place you can My spam filter just takes anything I really want puts it in spam And everything I don't want it puts it right into my inbox, but that's a whole other story. Sure me. Yeah, that's awesome I'm loving it So this is from a place called alpha vantage dot co And they have a nice documentation actually describing all of the different things their apis can do And one of their apis is called company overview And it gives you this big giant glob of jason For every company on the american stock exchange And the uk stock exchange and a whole bunch of other stock exchanges And gives you more statistics than I understand It gives me everything I expect the name the ticker symbol A description the latest earnings report Something called the pe ratio, which is the very very edge of my understanding. Thanks the wonderful nacela castaway linda And then it goes on from the price to earnings ratio Yes, which is a way of checking if a company is overvalued because if they make lots of money, but their stock price is low They'll have a high pe which means they're good value for money And that is the sum total of my knowledge of stocks and shares We've kind of we've bottomed out there But anyway, this data set is great fun Uh, but it gives you back the information for one company at a time and I wanted more So there's a fun little aside in the show notes I wrote myself a shell script To fetch the stock information for a bunch of tech companies and then use jq To assemble that into a new jason file consisting of one array containing the dictionaries for each of my stock companies And so if you will find a file named Uh, what did I call it tech stocks dot jason or something sensing like that? Yeah Well, how did you how did you tell it? Which stocks were the ones you wanted? So in my script, there's a bash array, which has the values a apl msft and ibm, which is how it knew to go Oh, it's just those Yes, they're the ones I want. Yeah, because I didn't want too big of a data set or Got it. Okay. Yeah, okay And so the little script basically goes off and uses the curl command to fetch those three Dictionaries and then uses jq to build an array out of those three dictionaries and save it to the file that you see Which is tech stocks are jason. So that's the file we'll be using But if you want a bit of you know, how did bart get that file? It's in the shell script and since we did shell scripting in this series I thought why hide my work? Why not share my work? I mean joy Anyway, all of that gets us back to this sort by thing So What if we want to do some sort of weirdo stock calculation? And I was going to calculate the pe ratio and then I discovered it's already in the data set So I've decided And this may be terrible advice. I am not giving you stock trading advice here But imagine you decide that in my infinite wisdom. I have discovered a new way to pick which stock to buy Imagine that I think the best stocks are the ones with the highest ratio Of pe ratio to dividend per share Which seems like it might Be sensible because a high pe ratio means it's a stock that's undervalued And a high dividend per share means they pay out a lot for every share you have So a stock that's both undervalued and pays well sounds to this idiot like it might be a good investment Right Again, we're not advising We're not advising. I just needed something to calculate that was not already in the data set and that wasn't I don't even know what it's called. We'll call it the bart ratio if you like Or should we call it the linda ratio because linda told me everything I know about stocks She may hate me having it cold after her because this may be a silly idea. But anyway, it doesn't matter The point is our data set contains the pe ratio And it contains a dividend per share, but it does not contain this new product of them. I've just invented So how do we sort our stock based on that? Well, the answer is we use a filter that multiplies those two values together in sort underscore by Now before before we do that We need to actually figure out what the answer is So I have a jq command in the show notes, which will print out the name of the company The stock ticker the pe ratio the dividends per share and their product So for apple that gives us AAPL is their stock ticker. Um, and the product of the two was 26.54 uh for ibm it's uh 153.1 ish And for microsoft it's 107.5 ish Which tells us that the smallest Bart index we shall call it is apples followed by microsoft and then the biggest is ibm So if our sort by is correct, it should give us back a apl msft ibm So let's try because it's smallest to largest because it's smallest at ours I could have run it through I could have piped it to reverse to get a new the way around but I didn't want to close on my example And it just sort of fit on one screen as it was um, so sort by Open a bracket to give it a filter and then the filter we give it is open another bracket dot pe ratio pipe to number gr close bracket star from multiply Open a bracket dot dividend per share pipe to number close the bracket So we now have two numbers being multiplied and then we close the argument to sort by So sort by is going to be by this ratio. I've invented And so if we run all of that through an actual jq command And then we pipe that to another filter that just replaces the giant big dictionaries with only their stock symbol We get back a apl msft ibm Which is sorting them based on this thing that doesn't exist It's just a number I mathematically made in the filter It existed while sorting and then vanished into the ether once the sorting was done So you can sort on anything right you can sort on some sort of weird average You can sort on anything you can express as a filter That's pretty cool Very powerful very cool and powerful and quite useful in the real world Okay, so now let us move on to adding and removing things from our array So we can reverse them and we can sort them in very powerful ways So the other thing we can do is add things and take them away And one of the things we learned in the previous installment was that jq is one of those languages where operators can be Overloaded is the jargon term in other words an operator can do different things for different inputs So when we use a plus with numbers we get addition When we use the same plus with strings we got concatenation You can also use plus and minus with arrays So you can have one array Plus another array and what it will do is concatenate the arrays If you use plus so we can say the array 1 comma 2 Plus the array 3 comma 4 will give us the array 1 2 3 4 Nope, I don't like it. I don't like it at all Bart That should be 4 comma 6 No, it should be 4 comma 6 Actually, if you wanted to do 4 comma 6 you would use the add function Which allows you to add arrays together and but anyway, that's what they heard of there. Let's not do that Um, I forgot I said that plus does mean add sometimes When you add arrays you concatenate them when you add strings you concatenate them What would you guess minusing two arrays might do? Well, I didn't read ahead Not much of a guess so the array to the left of the minus is considered the input The array to the right of the minus is things you don't like Anything in the left array that exists in the right array gets removed from the output So if you put something in the right array that doesn't exist in the Input array, does it just ignore it? It just ignores it. So the example I have is 1 2 3 4 minus 4 5 So 4 is in the input 5 isn't the answer I get is 1 2 3 So the 4 got stripped out the 5 got ignored. Well, I don't have to do anything. So I already is no 5 there We're all good. And no, it doesn't shout at you So you can basically say that there are these three or four values I need to veto and whether or not they're present I can stick them in that array and they will be pulled out if they are there and they won't be if they're not So maybe you have an array of words and there's some words you think are naughty And you might pick 10 random words and say take out the naughty ones You could always give the same array of naughty words and they don't only get sucked out I don't know why that example is in my head May or may not be related to a feature request for xk pass wd Exactly where my brain just went. Yeah, somebody was using fk pass wd in a school setting And wanted certain words that were I don't know trigger words or something Some of the words they suggested kind of made my head tilt like the word woman was one of the words They didn't want in it. But anyway, right away. I thought about that as you could just write out what you don't want And subtract it from the array Yeah, exactly right. Yeah, so that is two useful things you can do with the plus and minus operators with arrays Next up for manipulating arrays is deduplication When you're processing a lot of arrays particularly if you're combining lots of data sets You will very often end up with duplicates and you're generally you're very often not interested in the duplicates so An example that for you know, why it might be in my head Is find out every data breach an organization has been involved in Well, if you get every breach your users were involved in And stick them all into one big array you will have your answer But it will be massively duplicated because 10 people are in this breach and 20 people are in that breach But you just want a list of all the breaches So I just basically took all of the arrays of every breach Everyone was in shoved them into a giant big array and then went pipe unique And I came just My uniques because that is the function unique is deduplicate That made complete sense to me, but what I don't understand is why it also sorted. It's just what it does It's because the easiest way at a computer science level to deduplicate is to sort and then remove the Duplicate because otherwise you have to remember everything that's gone. Yeah, it's very ram intensive So a ram efficient ram efficient deduplication sorts first So that's what it does. That is why its side effect is sort and deduplicate So unique gives you a sorted deduplicated array And a lot of the time what you want anyway That's me. Yeah, because I found myself writing pipe unique pipe sort backspace backspace backspace Oh Wait a second. I don't have to do the sort. It's already done for me I do have a question. Have you set up a text expander snippet yet for pipe to number? Probably sure you might as well Yeah, we're brackets around it and then jump the cursor back into the right point Yeah, yeah, actually you could yeah Okay, so that's unique and unique has a friend So unique is great for simple values like a rate like your boolean strings numbers Unique will work with arrays and stuff But you may want to be a little pickier with a dictionary and say I just want One of these data sets I want one copy of of everything in this Array it's even array of dictionaries and you only want one where a particular key has a particular value unique underscore by Let's you do the same thing as sort by So you basically provide a filter that says do some calculation And then whatever the result of that calculation is That's what we're going to use to de-duplicate Hmm Okay, 99.9 percent of the time you're just going to use the name of a key But you could de-duplicate based on some weird math So we're going to use our menu for this one because it's a nice simple dictionary So if we Wanted to for arbitrary reasons Only get the items in our menu where the length of the name was different We could do We could do unique by So first thing I'm going to print out the name Pipe to the length Which tells us that hot dogs and waffles have seven letters But pancakes have eight letters So that means if I do a unique On the length of the name I should lose either hot dogs or waffles and be left with pancakes in one of those two And so if we do unique by open around bracket dot name pipe length close around bracket We get back hot dogs pancakes So we have lost waffles because waffles and hot dogs at the same length I had one quibble with part of your explanation on that on that you said Note that the elements in the output well You said basically that you don't know which one's going to disappear But I think you do I think it's the first one it finds that it gets to stay So since it's going alphabetically it would find the first one would be the documentation says don't count in it It could be either I went back and looked at the documentation barton. It didn't say that it had a section They gave an example and the example they gave it did it uh as It was a numerical example And it was the first one numerically I could just put it was a line there that says do not count on the order Or that the order is not guaranteed How can it be random though? It's got to be some there's got to be an algorithm that it's following right Yes There's got to be um I wouldn't I'll put it this way. I wouldn't count on it being one or the other But generally so it says the unique The unique by path expression function will keep only one element for each value obtained by applying the argument To give it as making an array by taking one element out of every group Produced by group and then it gives you three examples and the last one was unique by length And it came it it did it exactly the same way as the one we did here where it was the first one it got to alphabetically I'm sorry the first one as it went through the the list Right, so that's the rain the first one got to got to be to stay right So that says it uses the group by function to do the work And if you look at the documentation for group by it says we do not promise which one gets kept I follow this one down the rabbit hole Quite far I was on the tarmac in brussel's airport in the snow and had time to kill So I actually went and followed the to the docs for group by And in the docs for group by in this so I didn't Group by is another whole different function somewhere else up or down in the docs It's a whole separate function. Okay, and it has it's got to be knowable It has to be But they said They wouldn't make a guarantee. So I was like, well, if the docs don't make a promise, I'm not going to make a promise So I'm just going to be agnostic Okay, it looks like uncomfortable with the concept that that a uh a Computing platform would be able to say something was indeterminate in that case that just Seems unlikely. That's all right. I'm sure it's the same output for the same input every time But whether it's an obvious rule necessarily it may be a little unobvious. There may be some subtleties in there Particularly if you're dealing with complex data structures Anyway The point is we can de-duplicate with simple values or even with complicated values using unique underscore by 99.9% of the time you'll use unique by to just give the name of a key Because you just want and then you don't care which one you don't care because they're all the same. Yeah, exactly exactly So the last thing on dictionary or an arrays is if you end up with lots of datasets Like say a breached list of A list of breaches You're actually going to probably end up with an array of arrays Because for every person in an organization who's breached you get an array of breach names And so you end up joining all of these array array array array array and then you want to Flatten them all into one master array and there's actually a function for that. It's called flatten And if you give flatten an array of arrays It will just give you back all of the values as if all the square brackets vanished into the ether And even if it's arrays all the way down it will keep doing that recursively Unless you tell it not to go too far So if you want to preserve some of your arrays at some depth you can give it an argument Which is the maximum depth it will go until it stops flattening So if we give it the array one Followed by the array two three Followed by the array four which contains the array five six and we flatten it So we now have an array that contains a number an array another array that contains another array Doesn't matter. It's like arrays all the way down here It's very hard to say but you end up with the output one two three four five six It just flattens all of the arrays and you just get one two three four five six Now if we want to see the difference of flattening to different depths If we say flatten two on that horrible data structure it will go down to two levels within our array So that means we go into the first array two three And we go into the five six which is inside the four array. So that's two So one square bracket to four and those square bracket to five we go into both of those So we still get back one two three four five six But if we give it one It doesn't flatten that inner inner array So we get one two three four and then the array Five six because it stopped flattening I I looked at this one over and over again and it I'm sure this sounds like gibberish if you're just listening and not Not reading along And we're aware of that but the way I like to think of it is since we always count from zero Is if it just comes across a value That's zero if it finds an array. That's one deep if founds an array inside of an array That's too deep. So that's how you know if you gave it to it would get both of those one It would only get the first one Perfect. That's so much better said than I could have and it's exactly right And that that's probably what it's doing on the inside as well That's almost certainly how it's counting. You've probably described its inner workings perfectly We got lucky So that is our array manipulation. We can reverse them. We can sort them We can add things we can remove things We can unify it or de-duplicate as one would say in English and we can flatten them Which is quite good So the last thing then is dictionaries And we can add and remove keys from dictionaries And the plus operator is also overloaded here Which means if you want to add a new key into a dictionary You can do that by plussing together two dictionaries So on the left you have one dictionary and on the right you give it another dictionary with different keys And when you use the plus operator they get smushed together Um if your dictionary contains dictionaries And you don't want one to replace the other you actually want to go recursively and merge them All the way down to infinity You use the multiplication operator So add will do it to one level deep And whoever's on the right wins Multiply merges all the way down So Hmm, I read this one and I made a note. Can you elaborate on what you mean by recursive merge? I'm gonna ask you to say it give me an example if I if I've got a a dictionary that's one two three And a dictionary that's four five six Okay, well if it was a dictionary wouldn't be one two three So a dictionary could be monday one tuesday two Okay, right, right, okay, and then we could add to it monday to wednesday three That's an easy one right that's let's start with the easy case So monday one tuesday two wednesday three gets added to it. We get a new dictionary With monday tuesday wednesday all three of them. Okay If our the one on the right hand side also had a dictionary tuesday, which had the value four Then whatever I say in the show notes wins Because we have two values competing for tuesday and the show notes say whether it's the one on the left or the one on the right that wins It's one on the right If both define the new value the value on the right Okay, so that would mean that the one on the right wins. So then in that case it's going to be The second tuesday, which was four would win But now that's our simple example. So in that case plus and multiplication don't do anything different Now but I didn't hear multiplication in that whole explanation. You just did the addition one Right because in this case in this case the two are the same and the reason they're the same Is because our values were one two three So our values are not dictionaries our values are just numbers If our values are dictionaries Right, so imagine our monday dictionary contains total sales and no, let's say profit and number of items sold So for monday, we have a profit of 10 and a number of items sold of two now for tuesday We have a profit of 11 and a number of items sold of three Now we add to it another tuesday Which has a number of items sold of Don't do numbers just it has a number of items sold So if the second thing is a dictionary Right, so if monday contains another dictionary If you use plus then the dictionary on the right wins and everything in the first dictionary gets thrown away And only the dictionary on the right first gets kept Okay, if you use a multiply it does emerge Not a throw away and replace So would emerge add the values of the Add the values for the same key No, so if the if the keys clash you still throw away the one on the left, but if if the okay Imagine I think maybe you need to put an example in for this one because I'm Not getting close on it. I don't think you have an example. I don't because it made my head hurt a lot Um, oh, okay. Well, that's okay We can stop with it making our head hurt and if we ever need to use it we'll go look it up But it better not be on the work on the final exam. It is not. I promise you it is not in the homework Um, no, I can make it work. So if your dictionary If you have a dictionary which can okay, you have two really simple dictionaries You're multiplying together the first dictionary has one key named bob that contains Two keys a and b. It doesn't matter what the values are And you have a second dictionary that also contains the key bob But it contains the keys abc if you add them then No Ah Now I got okay. I have the perfect one Bob we're sticking to bob because I like bob right our dictionary has a key Bob but that contains a dictionary with the keys a and b And our second dictionary has a bob that contains the key c If I add them Then bob becomes c Because the one on the right wins so it doesn't become abc if I multiply them it becomes abc It merges at a deeper level Huh But if they both had a c in them Then the one on the right would be just like it would be just like addition Right, exactly. Sorry just by plus Precisely. Yes. So basically what happens if there's extra keys if you multiply and they get merged together If you add them the one on the right wins For the entire dictionary doesn't Doesn't do a deep merge. Whatever it is. It replaces it. Yeah, okay Like I say, I I have yet to use the multiply operator between two dictionaries But I have used the add operator between two dictionaries On behalf of the class, we hope you never make us use that one Or you teach it to us again with an example when you do I I didn't spend too much time on it because I think it's a bit of an edge case But I do want you to know it's possible Because that's that's kind of the important thing, right? Know what's possible so you can go look it up in the docs Because if you don't know it's possible you won't think to go looking for it Now this is where we get to one mild ripe I have So we were able to remove things from an array with the minus operator Doesn't work on dictionaries There is no minus for dictionaries as an operator If you want to remove a key you have to use the function del So so you delete it not minus it. Yeah So you pipe it to del and then you give del the argument of the name of the key you want to make go away So if we want to remove the stock from every item in our menu We would explode our menu pipe it to del dot stock and then wrap the whole lot in square brackets to re Implode I don't know catch together our exploding bits back into an array and then we get Fuse them back together and then we end up with the hot dogs with a price pancakes with a price and waffles with a price But the stock has vanished because we said del dot stock It's kind of obvious probably so why isn't it a minus? It should be a minus. Yeah. Anyway, it was right there ready to be used Yeah, anyway, that's that's that's the way they baked it. So that's what it does So I have a challenge for you for next time, which does not involve the multiplication operator So I would like you to take our noble prices dataset And I would like you to give me just a list of all the laureates But each of them just once so Marie Curie does not get to be listed three times She only gets to be listed once And I would like all of the laureates not in the order they happen to appear in the dataset But I would like them in alphabetical order, please So no duplicates and alphabetic, please And that is actually by sir name. I hope Ah So if you can get them alphabetic at all then You get full marks if you'd like a bonus prize Now this this this I've done this because I needed to make sure it was possible. So I have the challenge solution done And if you're naughty, you can keep at pbs 163 plus as a branch in gear But I know it's possible because I've done it For bonus credit. I would like you to sort it such that human beings Are sorted on sir name But organizations should continue to be sorted on their one and only name, which is their first name And when you print out the human beings, they should be printed first name sir name Even though they're sorted on sir name. I want you to print them. So Marie Curie should sort as a C But she should be printed Marie Curie. So she and Pierre should be next to each other So It can it can be done and the I'll give you one hint You're going to need to use sort by So do you remember In the old days when we managed our music files and iTunes and we used to use the id three tags You would have remembered from making your podcast episodes There was an idea podcast episodes have id three tags right still use them. There is an id three tag named sort order And if you wanted to have the Beatles sort as B You would put Beatles in the sort order field But leave the name as Beatles as the Beatles and then they would sort in your list By B would be printed as the Beatles Huh that principle is the key to this challenge bonus Okay well I hope I can figure this one out. This doesn't sound as hard but They're both. I'm often wrong the challenge solutions are both short. So if you end up with a long answer You've gone down a rabbit on the wrong path. Yes. Yes. They are both even if the second one sounds complicated If you take out the spacing and stuff, it is definitely less than 10 lines Mine is slightly longer because I have comments and things But in terms of actual raw work, it is less than 10 lines for both of them Okay So right So a little sneak peek of where we're going with this so At this stage we're doing pretty well because we can now manipulate each of the major data types We can do math on numbers We can manipulate strings We can manipulate arrays and we can manipulate dictionaries but We have a little bit more fun to have with dictionaries because dictionaries are very powerful tools And there's a particular way of using dictionaries called a lookup table That is very common in data sets that will be given to you by apis And which jq has great trouble processing As they are But jq allows you to translate them into a form jq likes Then you can process them and then if you like you can reassemble them back into a lookup table at the end Or you can use non lookup table shaped data to make your own lookup table and so It's It sounds like how can you have an entire installment about one type of dictionary but It was going to be a section three for today and it isn't because we'd be here forever Because it got it got to be more complicated or at least more interesting and more interesting frankly Yes, and very powerful And then we are ready to move on to what I thought would be 163 So you will notice in the homework And in our examples today You will see me continuously go open square bracket dot open square bracket close square bracket pipe some stuff Because we are exploding and then catching the pieces right. We've done it in so many examples today That looks inefficient And we're doing it all the time The people who wrote jq are very fond of making the things you do often simple We must be doing it wrong If our code is constantly full of this complicated structure We are doing it wrong There is a way to edit an entire array or an entire dictionary Without exploding it We can go into it edit it in place Which means that it can go into our pipeline array to array to array to array to array And yet we're doing micro surgery inside each piece All this time you've been making us do it the hard way Yes, because otherwise two things would happen. You wouldn't appreciate the power of the easy way And it would make your head explode Oh, okay It will not make your head explode now Because we've done all this work, but if I'd started here your head would have exploded and stayed exploded And this would have been a very short series and there would have been much crankiness So that's why we're doing it in this order And then we have a few more little bits of cleaning up But that is at that stage if you wanted to tune out you could and at that stage You're getting the bonus material the really cool stuff jq can do which will take you from a jq user to a jq power user But after these next two installments you have done You know the way it's the 80 20 rule 80 percent of the time you only need 20 percent of the features That gets you the 80 20 rule after those two installments And then we are going to spend a little bit of time doing another 20 percent of the remaining 80 percent We're still gonna leave the really mad stuff I've read every single word of the docs and I know what everything does And I've made a choice of what's worthy of our time and there are some really fun things for us to do But if you get if you find this is all too much you can tune it after these next two And then you'll have enough jq to do like almost everything you're likely to want to do But you can do it better if you stay tuned Right well that sounds good Bart I think you've also let people in on a little secret that Bart has started working ahead in the show notes So if you go to github and you look for his Programming by stealth show notes You can actually be pulling and seeing things as they get edited You don't get no guarantees. It's what it'll end up as but you know they always get rewritten But it's kind of if you want to see the sausage being made there'll be a branch with a plus on the end That's the one for future stuff So at the moment is 163 plus as soon as I start working on 163 it'll become 164 plus But any branch that ends in a plus is the super secret sneak peek Well, but there's they're all whip aren't they they are. Yeah, so they're whip dash So right now there's a whip 162, which is what we're recording right now the second and a whip 163 plus Okay, okay. That's what the plus means all right the plus and the plus has the sample solution to this challenge I just set because I always do the challenge before I write the show notes because I am terrified Of setting you a challenge. That's impossible Well, that would that would be a bad thing because I wouldn't find out till I went to write the show notes two weeks later You would have two weeks of hell and I would never hear the end of it So the show notes are not complete until the plus is gone the whip branches merged in I've done a typo check We put all the the links to the audio files in there And then Steve Metin has come back and told us all the typos I missed Yeah, and Steve is going to hear this in about three months time because he's about three months behind us But he's doing a sterling job of cleaning up after us Oh, I don't know though Bart. He's doing like eight a week. I mean, I think he's going to catch up and pass us pretty soon He might be fixing stuff in the plus branch soon He might be okay. Great Right folks until next time now that you're all gone wibbly-wobbly time. You. Why me happy computing If you learn as much from Bart each week as I do I'd like you to go over to let's dash talk dot ie And press one of the buttons over there to help support him He does 98 of the work here I'm just the stooge that listens to him and asks the dumb questions If you go over to let's dash talk dot ie you can support him on patreon You can donate via paypal or you can use one of his referral links I really hope you'll go over and help him out In the meantime, you can contact me at pod feet or check out all of the shows we do over there over at pod feet dot com Thanks for listening and stay subscribed