 Welcome to everyone who's attending today. We're going to talk about Power Pivot, which is a part of Excel and Excel Dynamic Arrays. There's no real connection between the two. It's just that no one else talked about Dynamic Array, so I've added it in here. All right, so what we're going to do, overview of the seminar, a little bit about Power Pivot itself. Already noticed a missing capital P there. And then it's mainly going to be demonstrations. So the two main demonstrations I'm going to do for Power Pivot are the loading of a JSON dataset, which is based on Twitter API tweets. And then to demonstrate how large a dataset you can use, I'm going to use a dataset which has 246 million rows in it. And that's available from the UK Data Service, SN number 7591. And then at the end, I'll just do a little run around the Excel Dynamic Arrays, which are relatively new. They're not as new, they're probably about a year old now, I should think. It's just a few little nice things you can do. Like I say, it's not really related to Power Pivot, but I just want to add them in. So today, we're probably going to go up to about an hour and a quarter in time, and then we'll add a bit of time on for questions at the end. So it's a bit longer than the usual broadcasts, but there's quite a lot in the demonstrations to go through. So for those of you who aren't coding people, this is a coding-free webinar. Everything is going to be mouse-driven, apart from at the end when we do the Dynamic Arrays, so there'll be a little bit of typing in formulas, but then that's the way you normally use Excel anyway. So the background to Power Pivot, I think it first was introduced in Excel 2010, and it used to be a little add-in to Excel. You have to go out of your way to add it in. But now with Office 365, most of the versions of Office 365 haven't already built in, so typically there's nothing for you to do to install it. And how do you know you've got it? It's got Power Pivot in the menu box. And this is a proper menu option. You'll also see the terms Power Query and Power View, which are essentially all integrated now into Power Pivot. We'll certainly use the Power Query editor, but we think of the whole package as Power Pivot for creating data models and manipulating them. So what can we use it to do? We're powerful data analysis. That's sort of the selling point of Power Pivot. We can create complex data models. So not only can we have a single data set, we can have multiple data sets in the data model, and we can relate them together in some ways and then make use of that relationship when we're doing pivot tables and the like. You can process large amounts of data, that's the point of the second demonstration, much more than a million rows. A million rows in Excel, I don't know if any of you've actually hit this as a problem, but have you tried to load a data set with more than a million rows? It just says, turn this along. When I get to a million sets, can't load all of the file for you, which isn't very helpful, but people tend to know that million rows is the limit, but it's a rather arbitrary limit because it's not really related to how big the records are. It's just a million rows and I stop regardless of how much actual volume on disk it is. So you can create a million rows of data with 20, 30 megabytes of data. The other thing you can do, we can put a variety of files, which is useful because demonstration number one is a JSON file. What is the data model itself? Well, I think I've mentioned a collection of tables with the relationships between them, or you can specify how they're related to each other. And essentially, it's like it's a completely, it's integrated with Excel, but it's also quite separate from Excel in that it has its own analytical engine and database. It's got its own set of features and functions, which it's gonna use to do the data analysis. And of course, a lot of those features are manifested in the ribbon and it's all point and click stuff, but behind that, as we'll see, that there's a whole array of functions which can be called, and you can use them independently if you want to, but we're not good to do that because this is coding free. And PowerPivot supports files, and this is the Excel file, up to two gigabytes in size, and enables you to work with up to four gigabytes of data in memory. I think that is possibly slightly out of date because I think if you've got Excel, 64-bit version of Excel, I don't think there's any limits, particularly other than the amount of memory. How efficiently it uses the memory though is another matter. But on the plus side, the data in your data model is stored in a highly compressed form, and I'll show you that when we get round to doing the large data set. So, PowerPivot demonstration. A lot of people, if you're not a coder, it's quite possible that you don't know what JSON is, and when you see it, you're gonna run a mile, okay? It's not actually designed, or it is designed for application to application working. So, although it is human readable, it's not something which we would necessarily embrace. It's not as simple to read as a table in a spreadsheet. And from the application point of view, that's quite useful, well, it's not useful, but it makes sense because it means they can, they can represent the data in efficient ways for the machine to machine type communications. But what the problem we have is we want to extract data from these JSON representations and make it look like a nice little table that we're used to in using in Excel and the likes. Lots of APIs will use JSON to send and receive data. So, if you're interfacing with an API from the web or something like that, the chances are any data you get back will be in JSON format. In terms of tweets or the Twitter API, it also uses the JSON format. And frankly, in terms of the complexity of the JSON that the API, the Twitter API uses, this probably gets about as complex as you're ever likely to see. So, if we can deal with this, we can deal with anything. And we're gonna do it all effectively using point and click. So, just a little bit of background about JSON. I will say this stands for application to application communication, but you can read it. It's extensively used by APIs, I've always mentioned that. And there are many tools which make it even more human readable. Jason, there'd be one, I'm gonna show you in a minute. And you can actually get add-ins to your web browser to interpret JSON correctly or reasonably as it makes it more readable. Okay, so this is a little shot of some simple JSON. And the way it works is you've got an open bracket, curly bracket here and a curly bracket back, oops, at the end, you get that one. And then you get this format, you get what's called a key, the only quotes, and then colon and then the value. And that goes all the way down for the record. Record in, or document in JSON terms, effectively means a row in your table. Okay, and you can imagine this, if you like squint your head down 90 degrees, you can imagine down this side here, these would be column rows and here we've got column values. So, these would be your table headings for the columns and this data here would be the first row of your table. And then after this, there would be another open and close brackets here with another set of data. Now, JSON will actually repeat the column names every time and then with new sets of values. Whereas, typically in our table, we write all the column names at the top and then we just put the values beneath them all in lined up, okay? So, JSON is a bit more verbose, but you can see that's quite readable. Then we get onto the complex JSON and this is actually taking, this example here is taken from a tweet, something returned from a tweet API to the middle of a tweet. So you can see this, so I haven't got the beginning, open, close braces there. When I'm describing this, these open, the little curly brackets here, represent nested structures. So within an example of JSON, a JSON document, you can, it will start as we said here with open and close brackets, but within there, there can be more open and close brackets, open, open, the close there. And everything within there is also treated very much the same way like a document in its own right. Now, when we're dealing with these curly brackets, they don't in themselves present too much of a problem because the way we can look at it is because these are all nested in one another, we can actually create a new column name just by using the dot notation. So if you take this example here of ID string, whatever that value is, what we can do instead of just saying ID string, we can say, well, within entities, user mentions ID string, and we can concatenate those three together and that will produce a nice unique column name. Very long-winded, but it will make sure it's unique. And certainly in terms of the ID string here, ID string is a key value which is used several times within a tweet. So you have to have some way of distinguishing the two. But in principle, it's relatively straightforward of the approach you're gonna take. If you need to just concatenate the different levels of the entities together. And we'll see when we look at the power pivot, how we're given the option, do you want to do that or not. The more complex example of JSON is when we have square brackets. A square bracket, open code square bracket, represents an array of values or a list of values. And the problem with these is that each value in here, so the three and the 14, they don't have their own names. You would refer to them normally by using the index into the array. So the three here would be index zero and the 14 would be index one. And there could be more, we don't know. They haven't got individual names. And this presents problems when you're trying to convert this into a nice flat table-like structure. And the way this would normally be dealt with is that anything in square brackets that you want to keep, and let's face it, a lot of what we're gonna look at on the Twitter we're gonna throw away. But if you did want to keep something which is effectively a list, then the best way of dealing with it is actually to create a new table with those values in it. We're not gonna get any quite that complicated to do today, but that would be the approach that you would normally take. And then you'd end up with two tables and then you'd have to create a relationship between the two tables. And that's where the relation of relation or databases comes into play. But for here, for JSON, it doesn't need to use it because it can use this square bracket type structure. Just a bit on the terms, I'll probably I'll mention some of these. A table is referred to a collection as a collection in JSON. So we're gonna look at a collection of tweets. A row is a table of a table equates to a document in JSON. So each tweet is within a document and a column and a table is referred to as a field in JSON. So what we're actually going to do, we're gonna take a collection of JSON documents which have individual fields in them and create a single table with rows and columns in it. So essentially this process, we're gonna flatten down the JSON file so that we can use it in Excel. Okay, so on to the first demo. For this demo, I'm just gonna open Excel. This is standard Excel. You can see here, I've got my little Power Pivot option there. I don't need to worry about that at the moment because the first thing I need to do is import my data file. So on the data tab, I can get data from file from a JSON file. I meant it's gonna invite me to go and find my JSON file which is somewhere in here. I've actually got a selection of data files, topical names. Essentially I've named these files after the search term I used when I was looking for the tweets. So select the first one, I'm just gonna say import. And then JSON, I beg your pardon, Power or Excel and Power Pivot will realize this is a JSON file and it puts me straight into the Power Query editor. Because you can't import a JSON file in the same way as you'd import a CSV type file. Before we get into that, I just want to show you what this JSON actually looks like. This is a JSON editor, an application you can get for free in Windows 10. I've loaded the file already into here, CoronavirusJSON. And this, what you see on the screen is sort of what the JSON really looks like except it's not really colored the way this one is. It's already picking out things which helps you read it. And if I click on the button there, it will format that to make it more human readable. It doesn't change the data in any way, it just makes things more readable. So you can see here, all of my 100 tweets are effectively in a list. I can tell that by the square brackets there. And then within there there's all sorts of structures and you can see how deeply these things get nested. But right at the top level, we've got the nice, easy ones that we might want like when was this created, the ID string. This is the user who made the tweet. If you haven't used the Twitter API before, you get an ID and you get an ID string. You should always use the ID string because as you can see in this case, the ID itself has been truncated and Excel will truncate them as well. So always go for the ID string when you're extracting the data. You get the full text of the tweets, except the fact that's not the full text because that will be probably truncated and we'll see in a minute where we actually get the full text from. But you can see the nested structure of this and also how much of it you almost certainly not gonna want. Yeah. So we're not obliged to take everything that you see here. We can pick and choose what we want to take. So having started the load process, what Power Query is saying or Power Query Editor is saying, well, you seem to have this thing called statuses, which is a list, which I've just shown you. And then right at the end, we've got this record, which is search metadata. Now we're not terribly interested in that because we know that all of the tweets are in this list here. So the first thing we're gonna do is say take this and convert it into a table. This was only button up there to use, so we convert it to a table. And now, effectively, it looks very similar, but now this is now in table form and we've got this list and record. Again, we're not interested in the record. So we're gonna go to the list. With our right mouse click on that, I get this option to drill down. Okay. I'm gonna take the drill down option and now I get what will be 100 records, each one representing a tweet, yes, or a document in JSON. It doesn't try to tell me what's in there because as we've just seen, they're very complex structures. So it just says, well, it's a record. What do you wanna do with the records? Well, the first thing we're gonna do with this list of records is convert that into a table. And here it's gonna ask me little questions, which I can totally ignore. Select or enter, I'm not interested in that. Errors shows errors. I'm gonna click okay. And now I've now got a list of 100 records. Yeah. And each, again, each one of these is one tweet. Up at the top here, I can't really expand this really, but at the top there, there's a little symbol here, like two little opposite pointing arrows away from each other, is a little button. And if you click on that, what it will do, it's very much like the dropdown in an Excel table. It will show you a list of all of the fields inside the record, okay? And it will actually say list isn't complete, but there's quite a few there. But just to make sure I'm gonna say load more, okay, I've only got 100. And then I've got a complete list of all of the top level keys within the records. One thing, I'm just gonna go back to my Jason editor, just to, there's different form, so you can show this in. You can see it here, it's the same data, it's just presented in different form. If I click on the statuses there, you can see against my records, which will come from 0 to 99 in this case. But in brackets here, it's telling me how many top level keys are in that document. And you can see that these numbers are not, in fact, all the same. Yeah, they're very 25, I think it's 29, okay? If I do say I want all of them, there's actually 31 listed there, I think, accounted. The point being that two consecutive Jason documents don't all have to have the same set of keys in them. So you have to be aware of that when you're trying to load Jason, because you may accidentally miss out on some data. But here I've said I want them all. Remember we said that on records, or that's curly brackets, we can create column names by concatenating using the dot notation. And this little option here is saying, do you wanna do that? Use original column name as a prefix. Well, in this case, because this is the top level, I don't need to have everything prefixed with column one. So I'm just gonna uncheck that. And also, this is our first option for getting rid of stuff that we're not interested in. We certainly don't want everything. We might go for the created act, the ID string, as I said, the full text, we'll see if that is a full text in a minute, entities. And then further down, we have this retweeted status. And I'm gonna check that as well, okay? And I'm gonna click okay. And then it will actually load for us only the things which we asked for. So created that, the ID string and the full text come in as just simple values, strings in all three cases, okay? The entities and retweet status come in as records. And in some cases, the retweet status, they've got no values. That means that for this particular record, the second record, there was no retweet status key, okay? Which is quite acceptable. It means it wasn't a retweet. Which you can probably also tell from the fact there's no RT in the front of these two, okay? So the next thing we want to do is drill down a little bit further and decide what it is we want to, whether we've got the full tweet there or whether we need to go to the retweet status, okay? So the first thing I want to do is expand the retweet status in this record exactly the same way, load more. And you can see here I've effectively got a very similar list. Because this is effectively the complete tweet of the, which was retweeted. So I select all unusual things, but in this case, I'm really only interested in the full text and the entities. Because what I'm trying to do is get a record of the full text of a given tweet, okay? So now I've got a couple of things to show you. On this left, right hand side here, it keeps track of all the things I've done. And if I want to, I can delete a step or I can go back and look at something which I did before and so on. I can go this all the way up the tree and all the way down the tree again, yeah? And I just wanted to go back there to show you that. Okay, if I go back onto there, I'm gonna delete that step and do it again. And what I did was I expanded that third more. I got rid of that and I wanted IDs, and I didn't want the ID tree, I wanted the full text and I wanted the entities. And you see here, I can use original condos prefix. That's unchecked because that's where I left it before. And if I leave that unchecked and click okay, as I did before, you can see it will recognize the fact that I've already got something called full text. So it will add its own little suffix here of dot one and it does the same here for entities as well, okay? So it will resolve the issue of duplicate names as it needs to do so. Now having got these out, what we want to do is we want to work out which of the tweets has the full text in it. And the rule is very simple that if it starts with RT in the original, then the full text in the retweet status actually contains the full text. If it doesn't, well this is gonna be null anyway so we know it has to be the full text, okay? Now what we want to do is now create a couple of columns which reflect that information. So what we're gonna do is go up to add columns and we're gonna add a conditional column. And this, we're gonna give it a name. So we'll call the first one full tweet. This is gonna be based on the column name full text one. And what we're saying is if full text one equals null, i.e. it doesn't exist, then I want to use, here there's a little drop down, I can enter a value or select the column. I want to use the full text because it's not a retweet so the full text has the full text in it. And the L statement is exactly the same. Select the column and here I'm going to want the full text one because we know that's got the full text in it. Because if it's not null, it has the full text. Pick okay and now I've got the full text in there. I'm gonna do exactly the same again. And do this for entities. So, oops, entities. If column name entity, I could use the full text one. It wouldn't make any odds because they're both null. We know they're gonna be null. Null, then I want the column entities. Otherwise, I want the column entities one. I'm gonna click there. Notice that these entities warn and what have you, they usually get dragged in as they currently say so they're all records. So now on my full tweet and my full entities, I've got the full text of the tweet and this record which contains the entities. So, these columns up here, full text, I'm using the control key to do multiple selections here are now not wanted. So, I can go to the home and I can say remove columns and I get rid of them. Again, I'm just tidying up as we go. We're always interested in the string, the time, the full tweets on the entities. So, the only thing left to expand here now is the entities and again, same routine, it's a record, therefore you get the list. You say load more if it's not complete and you see it's added media in there. Now, for the demonstration, all I'm gonna select here are the hashtags. I could pick them all, I'm just gonna select the hashtags and again, I don't need to have the prefix because hashtag doesn't appear anywhere else. So, it's gonna just come up as hashtags. Oh, and surprise, surprise, these are a list. Now, at this point, we've got a little bit more of a problem, this is getting to the stage where in the JSON, where we had square brackets here, that should be able to find one, use the hashtags. It doesn't matter, square brackets, these are things which we can't easily just expand in line as we've been doing with the records. So, when we click on this, it will actually give me the option to expand into new rows, okay? Now, before I do that, just wanted to point out, at the moment, we've still got 100 rows, 100 records, and these ID strings effectively uniquely identify each one. Potentially, there could be duplicates there because I could have accidentally picked up a tweet, the same person tweeting twice, okay? So, what I want to do is I want to add another column, but this time, I just want a simple little index of what my rows are. So, in the index column here, I have options for starting from north, start from one, or a custom one. I'm just going to add one from one, okay? And then there you can see, they're just numbering one to 100, and the reason I'm just doing that is just making it easier to spot things when I expand these hashtags, because when I do the hashtags on my list and expand to new rows, what you can see is that some of them have come up as north, which means there wasn't any hashtags in that tweet. And the other ones have come up as, oh no, let's just undo that. And the other ones come up as records. So, the records need to be expanded. I'm not going to be interested in this either, it's just telling me where about in the tweet that occurred. I just want to get the text of the hashtag, okay? I haven't got text in here, so that should be okay, so text. So now, instead of having that record, I've actually got the list, the name of what that hashtag was. It doesn't have the hash in front of it, it just has the name, okay? Now, what was interesting about this, the null I've already mentioned, that means it wasn't any, but if you come down here, these records with an index of 10, there's several of them now, and that's because this tweet ID of, or tweet number, index 10, had many items in that list. And so each one has been expanded out into a separate item. Now, in practice, this means that all of this stuff around here for nine of these, four, there's eight of them. So seven of these eight are pretty well redundant. And again, the reason why, in the normal circumstances, when you get around to expanding the rows, you would normally separate this out into a separate little table by itself. How would I have done that? I'm not gonna do it for this example, but how would I have done that? If I go to the queries over here, this is taken from the file name, if I right mouse click on that, I've got this option of duplicating it. And effectively, duplicating it is splitting it in two, and I can continue on from where I left off and still retain the old one. So what I would do is I would keep the index here, keep the text there, and delete these three rows. And that makes a little table by itself, okay? But for what we're doing, we're just gonna keep it all together like that, okay? And that is essentially what all we're going to do on that. And so finally, if we're happy with this how we want it, all we have to do now is say, close and load, and we've got options here of how we're gonna do this. If I just click on close and load, it will just write it back to my Excel spreadsheet. So the usual dialogue of say, where do you want me to put it? I can also do a close and load to, and at this point, it will allow me to create it into a table, that's like the default. Or in terms of using the data model, I can add this to the data model at the bottom here. And I can also say, just create a connection, okay? So that means that I'll keep a reference to where the file that you used, but the data will be put into the data model, okay? And I click on okay, what will happen? Nothing on my spreadsheet, but it says 133 rows loaded into the data model, okay? So if I go back and go into, let's see, can I edit from there? Go to edit, it will take me back into the query editor, and you can see exactly where I left off, okay? If I want to load that into, okay, I'm not quite sure why that isn't loading that into the spreadsheet in the normal way, I'm not even getting a dialogue up. Anyhow, the last thing I wanted to show you, go back into here, I've already shown you that on the right-hand side here, it shows you all of the applied steps, and you can go up and down there, you can undo them, delete them, and redo them and what have you. So this is a complete record of everything that you have done. And if I go into view and the advanced editor, what this will show me is a list of all of the steps that I took. And essentially, this is all things it has done on my behalf. So I'm just going to do a control C to copy that, and I'm going to close it, and then I'm going to close the power query editor, and I'm going to close this Excel spreadsheet without saving it. And I'm going to open Excel again. So this is a brand new copy of Excel, empty worksheet. I'm going to go to data, get data, and from other sources. At the bottom here, I've got this thing called blank query, and if I open that up, what I get is the query editor opening up with an empty, clean slate. There's nothing on the applied steps, for a query name or query one. I can change that if I want to. But if I now go into view and go to the advanced editor, again, this is telling me you haven't done anything. So if I remove all that, and do control V to paste in what I had before, and I'm just going to change, this first line here has the file name in it, and I'm just going to change this to one of my other files, PPE. And I'm going to say done. And it will then go through and apply all of those steps again, but using this different file, okay? So I'm exactly in the same position I had before, except this was done on a different file. So once you've got the set of instructions, it's worth your while saving them somewhere, just copy and paste, because it's just text, just save it into a notepad type file, and then you can use it and all you have to do is change the file name. See if this is going to let me, ah, there, this time it's worked for some reason. So now we have all of our text in here, okay? So it's just occurred to me there's one more thing I was going to show you in here. Let's just go back into the editor. One of the things you do when you're loading the data and you're transforming the data is, the editor will try and make it a good guess at the type of data you're loading up. So we know this is meant to be a string because it's called ID string, that's the whole point of it. But this created that, it looks sort of like a funny timestamp, but Power Query didn't recognize it as such, it just left it as a string, okay? If I wanted to, I could go to transform and I can say data type, it says any at the moment, I can try and say, well, it looks like a date time, at which point it comes up with in there because it doesn't recognize that as a date time. So I can just back that out to where I was. And then if I wanted to, I could create new rows, add column, a custom row. What I'm going to do is try and extract the bits of information from there, which represent the date, yeah? And to do this, this is the closest we sort of get to Excel in that we need to write a few functions or make use of a couple of the items within that string, a couple of functions, text items. So I'm going to call us the first thing called month, meaning because that's the one that's closest to hand. And in here, this expression, well, if I was using Excel, I would try and use the mid expression to extract from that string. So I'll type in mid. Now you can see here that it's giving me a whole host of things, none of which actually says mid, but text middle sounds pretty promising. So I'm going to go for text middle, I'm going to open the brackets and it gives me some help immediately saying how do you fill this in? Oh, that's good, because it looks just like it is in mid for Excel. So what I'm going to say is on the left, on the right-hand side here, I've got a list of the columns. So we want to create that. And then I've got to say what is the start position of the month and these strings start with zero, like lots of things. That's zero, one, two, three, four. Starting at four, and I want three columns. Yeah, and that's all I want. And I click on okay, you can see I've got May there. I can do the same for month and year and put them together there. And having those as separated out, it makes it more easier for me to manipulate the data using the times. This is just a string. If I split these all out into month, day and year, I can then cut in them together and actually have that treated as a date, which is a lot more useful than as it is there. And similarly, I can do the same thing for time. But again, it's just a case of adding the more columns. I'll just do the one more, try and extract the day. Again, the same function I'm going to use, open brackets, crazy, that's comma. And now I'm starting for the dates, zero, one, two, three, four, five, six, seven, eight, nine, zero, one, two, three, four, five, six, seven. Well, we'll try eight for two, for two characters. Let's see if I've got that right. 16, and I can do the same for year. This is one that takes a lot of counting. Created that comma, where was that? Eight, nine, 10, 11, 12, 15, 16, 17. I think about 27 for four. The first one is the star position, the last one is how many characters? I've got that wrong, so that must be 26. So I could delete that and do it again, or up at the top here, it actually shows me what it has done for me. So I can actually just edit that and change that to 26. And now I've got the date there. And the final thing I need to do is make that into something recognisable. So again, one more custom column. I don't need any expressions here. I'm going to call this date. And what I'm going to say, this is the day, and then I'm going to concatenate using the ampersand, my thing, ampersand, the month, ampersand, ampersand, and yeah. I didn't rub out the custom, just called custom here, but never mind. In custom, yeah. Okay, I don't need that, ampersand. Okay, and if the expression is fine, it tells you the bottom notes in text are has been detected. So click on okay. And now I've got my 16th of May, and that looks more like a date. To the extent that up in the transform, I should now be able to say date, and it'll convert that into proper dates. And I can get rid of those three columns if I wanted to. Okay, and then when you move that back into Excel, that would be treated like an Excel date, and you can manipulate it in any way you want. Okay, but that's about the end of this. Let me just close that down. Everything you do is saved as we've just seen, and that's my spreadsheet with the data. I didn't see it's brought back, it's automatically updated then, because I told it to keep it. Yeah, so there's my flattened bit of Twitter data. So let's close that one. I don't really need to save that. Back to the demo, so that's in the first demo. The second demo we're gonna look at is this 246 million row data set. Yes, it's genuinely 246 million rows. On the original data set, it occupies about seven gigabytes of disk space. When it's loaded into Excel, the Excel data model, it's reduced to, well, the Excel file, if you like, is 650 megabytes, okay? So you can see that the level of compression which has taken place. So this is even when you have got limits on how much memory you've got, you're gonna be able to deal with far greater data sets when you're using the data model. So the steps we're gonna follow here, we're gonna do them all. First thing is to load the file, it's called EDRP gas. This is the one which has 246 million rows in it. I'm then gonna create, or month and year, our new columns, which we create, in exactly the same ways we've just been creating columns there. And then I'm gonna do a group of data sets by the non-ID month and year. I haven't written it down here, but I think what that effectively does is it's gonna, because of the way the data's structured, you get about up to 48 entries for each day of the year for each non-ID. And by grouping it by month, I can reduce the number of records down from the 246 million to about 180,000. So in theory at that point, you could stop and just bring it back into Excel because it's well within Excel's limits. But in fact, what we're gonna do is add a, do a little bit more work and we're gonna add what's called a geography file, which again, it's part of the same steady number from the data service. And you can link those up or create a relationship between the two based on this non-ID field because that's common to both of them. And then we're gonna go to Excel pivot tables and just see how we can create little dashboards with the data, okay? So the bits I'm not going to do because of the time, I'm not talking about how late we're running now. I'm talking about the fact that this does actually take quite a while to do. Let's go back into that. We'll start from step four. So this is what the gas file looks like. This is taken from the power pivot. You can see it's only got four fields, but 246 million records. You can see what we have are the non-IDs, which are effectively just numbers, identifying a household. Again, a weird interpretation of a time stamp. The HH is half hour, and it's just giving you the half hour of the date. So that's like eight o'clock in the morning, yeah? And then the gas kilowatt hour is used for that half hour period, okay? So what we're doing in the grouping is, is we're gonna add two rows first. Now this month and this year have been created in exactly the same way as you've just seen me do for that other weird time stamp extracted from there. And then we're gonna use the group by screen in power pivot. And this allows me to select multiple columns as I showed you before. It lists the columns we want, and then at the bottom here we're saying, well, I'm gonna create a new column because there's gonna be an aggregate of some kind. What kind of aggregate do I want? I want you to sum and I want you to sum the gas kilowatt hours column. So essentially what I'm gonna get is for each a non-ID, within each month, within each year, I'm gonna get the sum of the gas used. And that aggregation is what brings us down to 180,000. So let's say in a sense that might be enough for you, but we're gonna take it a bit further, okay? So up to our gas three. This little table, in here, I've already, the gas three spreadsheet, I've already got the data model set up. This little table I'll come back to in a minute or two. But first of all, I want to show you in power pivots manage the data model, I have a look, see what's in there. And in there, you can see the tables which have been set up. So I've got my idiomrp gas with all of the entries in there. So essentially where I got that screenshot from a minute ago. This is the summarized version, which of course now just has the non-ID, the month and the year and the monthly kilowatt hours. I go down far enough, someone who started using some gas, I don't know why. But like I said, it's still 180,000 of this. And the geography table I mentioned is this one here. What I'll do is I might just load that again just to show you how you load a simple file. So I'm just gonna minimize that. You can have the data model on power pivots manage data model open at the same time as the Excel spreadsheet. Whereas before, you can't have the power query editor open at the same time or as you have the spreadsheet. So you have to close it, the editor first. So from here, this is just straightforward Excel. I'm gonna open a file from CSV. Where am I? I'm still in quitter in it. This geography file is the one I want to open. I'm gonna say import it. I probably will complete this because I've actually already imported it once. But you get the idea, create the import and you get the dialogue coming up like this. And it allows you to say it says it's detected the types based on the first two rows. It immediately recognizes it's got column headers, that's all right. And that is in fact a genuine CSV file, okay? So it's chance that you're not gonna want to load do any of them. And then again, the load button down here or rather if I click on the transform that will open the power query editor and I can do other transformation should I want to. Or the load button here is exactly the same as we had up in the top corner here before. I can load it into Excel. This is a small file so that wouldn't be a problem. Or I can load it to the query editor, okay? If I load it to the query editor, I won't actually do it because as we've seen in the query editor, I have in fact already got it loaded. Oops, and it's this file here, okay? Just a straightforward, simple file. So while we're in the query editor, what I want to show you is the diagram view. Now you can see at the bottom here, little tabs, entries, four little tables I've got. The EDRP gas is the original one, so that's not really needed anymore. But these three we're gonna make use of. So if I look at the diagram, EDRP gas, he's out on his own, he's not connected to anything. We're not gonna be using that in anything we do, okay? But for these three items here, this is the one that essentially has all of the data. This has got the monthly kilowatt hours in it. And what we're doing is, I'll delete these and do them again, oops, delete, delete from model. So your starting point is three independent tables. But we know that in the geography file, a non-ID, it's spelled slightly differently, is effectively the same thing as a non-ID in there, in the summarized file. So we can actually just drag across to it and a link is created for us automatically. So these tables are now linked by the non-IDs. And similarly, our table three, I'm gonna do the same thing. And here, I know that the month here is related to the month name. Yeah, so again, just drag it across and the link is created and there we are where we had it before. The table three here, let me just minimize this, not the links, this is table three here. And what I did is from, I just typed this in and made it into a little table. And if I click down here, you can see it's called table three, okay? And then I clicked in the getting transformed data section, I clicked on from table range, and it will automatically load it into a query editor. And then from there, I could add it, load it into the data model, okay? I'm not gonna do that again because I've already got it there once. So we discard that one. The other thing I did when I had it in, once I got it into table, the data model, this is just as it was when I typed it in. But what I then did was I selected the north, the monthly name column, and I went to this item appear and the sort of filter, I said sort by column, and I clicked on that. And then that brings up this little dialogue, which allows me to, for the color of select its month name, I can say how I want it to be sorted. And then this here, I've only got one option here, I'm gonna sort it by month num. And the reason, what that allows to happen is under circumstances where you want to sort these months, it will sort them based on this number system down here, i.e. in the order that you see them there. Whereas had I not got this done, hadn't set up this sort by column, when you sort these months, they'll come out in alphabetical order, which is typically not what you want to happen, okay? But that's already been set up for me. So the only thing that remains to do now is to, you can do from here, from the power pivot, there's this little pivot table, and I can say pivot table, and it gives you lots of choices, I'm gonna go for a vertical chart and table, new worksheets, this is very much like normal pivot type stuff. And at the bottom, I've got my pivot table here, and the chart at the top, nothing in them. But what you notice on the right hand side is I've got a list of all of my tables. And these are all of the tables from the model. Now, we're not interested in the first one, and that last one is just the one from the itself spreadsheet itself. So the ones we're interested in are the summarized data, because this has got the monthly kilowatt hours, we're definitely gonna want that in the values. Although I don't want the sum of kilowatt hours, I'm just gonna change that value settings to average. I'm gonna set the number format to two decimal places. Okay, which doesn't tell me very much, just tell me the overall average. So what do we want for columns? For columns, I want months. Now, if I take the month from here, which I'm quite entitled to, you can see what I was saying before, because that's just the straightforward month. It comes out in alphabetical order, which isn't what we're gonna want. So get rid of that and pick the month name from table three, and now the all in the right order, okay? And finally, what I want to do this by is this thing called nuts one. I don't know if you've heard of nuts system, but essentially it breaks the country down into different regions and gives each region a letter. I can't remember what the actual mapping is. I think I is London, I'm pretty sure Scotland, and so on, Northwest, Northeast, and so on and so forth. You've possibly seen them before, okay? So now what I've got in there is a table based on the data from 246 million rows summarized into a little manageable spreadsheet dashboard. And up here for the chart, I'm gonna do something very similar. Keyletters say the geography, I want the nuts one along there, and table three, I want the month name down there. So again, just a little graph, it's a bar chart. I want to change that perhaps to a line or lines on there. And the final thing I want to do is add a couple of slices. I want the slicer for nuts one and yeah. Now yeah, it isn't featuring any of my paper chart as such, but I'd still use it to filter the data. So like two lines up here, click this one, go to report connections under the slicer. Just need to make sure, one sheet for, that's gonna link that one slicer with both my chart and table. And we do the same here, table. And now when I remove or add things from here, it will affect both of the tables and charts. So 2010 isn't much good because I haven't got a full year. 2008 isn't much good because we haven't got a full year. 2009 is the best sort of set of data, the most complete set apart from UKL then. And again, similarly for here, I can systematic. The dash dash isn't very important. The dash dash is an area where they wouldn't give an area for disclosure type reasons. But all of the other ones I can either click individually or I can click multiple ones and have them all appear on the graph. Okay. Oh, now it's still saying some. Did I not change that to month average? That's better. As you can see, as we might have concluded that over the year, everyone across the country uses less gas in the summer than they do in the winter. Okay. 246 million rows of data for me to work that out. But there you go. Okay. That's the end of the demo for this. So I'm gonna minimize that at least. And then just the last few minutes, a little bit on dynamic arrays. What other, they're relatively new, about a year old now I should think. But they got rolled out gradually. So not everyone got them at the same time. But in most of the office Excel editions now I think. And what it allows you to do or what they allow you to do is use a function which is gonna return more than one value. Now we'll probably all use functions like the summer function where you give it as input, lots of different cell or cell ranges. But it always just returns a single cell. And that's the same with most of the old Excel formulas. There were a few specifically obviously array based like the ones for dealing with matrices and what have you. But generally they weren't used very much. And also if you wanted to end an array you have to use this control, shift, end, end and curly brackets. And again, not many people, even if you used them you could never remember the next time you wanted to use them how to do it. So what we've done, what Max has done is sort of phased this out in favor of these dynamic arrays where you can have a function and it will automatically return many values. If it's relevant. So for that we've got some completely new functions and some of the older functions behave a bit differently because they make allowance for the fact that you get maybe asking for more than one cell to be returned. And then we'll have, you can, having created one of these dynamic arrays we've got ways of referencing the entire array rather than individual value. So how do you know if you've got them? Well, I suspect this is perfect now because I think most people will have them. But if you type in the formula, I know it doesn't much look like a formula but A1 to A3, that is a range of three cells, these three down here. And if you type it into this cell here if it comes back as one, two, three then it's a dynamic array. If it only comes back with one which is what the old system would have done then you haven't got dynamic arrays. Alternatively, you could just search one of the new functions like unique which is probably a quicker way of doing things. In all there are six new functions. There's a filter around array, sequence sort or binary. We're just gonna look at about three of these I think. But the links here will take you to the full definitions and examples. So for this I need to open, I've called this Microsoft because I pinched the data from Microsoft that I'm using. Okay, we'll start off with the sequence, examples of sequence. And the way this works, in this column here I've just listed what it is I'm doing. The actual function call or formula is on these cells here. So you can see if I just say equal sequence five, I get five rows counting up one, two, three, four, five. Very simple, straightforward. If on here I say one comma five, that's not saying from one to five. That's saying I want one row and five columns with numbers in them. So it does them across. I can combine them both and I can have a little matrix set up here saying three, four. And it just starts counting from one by one for four columns, three rows. The other parameters I can use, so the three, four is the same as I've had before. Here I can give it a start value of 10 and I want you to go in steps of 10. Okay, so what do I end up with? 10, 20, 30, 40, blah, blah, blah, blah. All the way up to 120, okay? So a simple way of getting sequence of numbers into a range of cells. It's probably more fun using it with other functions. So in this one here, I think all is a bit bigger, that might help. Here, what I'm doing, I'm gonna use the mid function and I've given it a string A to J and then for sequence, I've said 10. So that will give me 10 rows, one, two, three, four, five, six, seven, eight, nine, 10. But because it's within the mid function, it's gonna use that as a second parameter to mid 10 times and then the last one is just the mid parameter, say I want one character. So if I run that, what I get is A, B, C, D written down the side. I do something very similar when you add in this transpose function and then I'll write them across as columns like that. And finally, what I can do more complex example, I will choose them actually here. The first one is using the text function then within the date. So essentially what I'm doing is I'm extracting the month in the full written format from the date, it's based on today's date and I'm doing it from one to six. So what I do end up getting is a list of the first six months, okay? And this final bit here is one of the other functions called random array, which again, like sequence allows you to give you rows and columns and then a start between values really between one and a hundred random number between one and a hundred. And you can see there all the random numbers. So together it's a sort of wave, you can imagine this is a wave of creating your own test data perhaps. Moving on to unique. This is Microsoft data I've borrowed from one of the database systems. It's just a good example because it's about people and names and addresses and education things like that. So just in a reasonable example of data. So what the unique does, it allows you to specify an array or a column name in this case and it will actually extract the unique values in that array, okay? So if I run that, it just lists in here the five values, okay? Across the whole of that 2000 data set. Some things that I haven't pointed out at the risk of breaking things. If I go to the second value, if I go to the top value, this is where I typed in my expression, my formula. And if I click on one of the ones below it, you can see the same formula, but it's sort of grayed out. And that doesn't grayed out. If I hit the delete key, it will let me delete it, okay? If I try to overwrite it, what happens is it will let me overwrite it and then everything below it disappears and the point where I put the formula in, you get this error called spill. This is a new error type, if you like. And effectively what it's saying is, well, you've asked me to do something but you haven't left me room to do it. Previously with array type functions, you have to mark out the spot where you want the dancers to go and you have to get it pretty well right otherwise you've got upset. Here it will just spill down a new space or spill across a new space and providing there's nothing in the way. So if I just delete that now, then immediately it can put them back in. This unique list here, I can then just add another little standard type formula to do a sum of countives and this is effectively going to be the frequency of each of the occurrences of all of those values, okay? So I can get little frequency tables out. And that's just doing the same thing for, I think that one was education and then occupation. This is a unique taking using the middle name. I was quite surprised how many different letters of the alphabet are used in middle names. There's also a sort function. So I thought I wanted them in alphabetical order. I could have concatenated this with the sort function. But one of the other things you can do with unique is you can actually, there's a third parameter, which if you set by default, it's false. If you set it to true, it will actually only return the values where there is genuinely only one value of that type. So for the middle names, there's only three which only occur once. Yep. So you can use that to find unique values. But the previously to this, if you wanted to do this sort of thing, you'd probably end up using a pivot table to do it. So now you can just do it in line as you want. And finally, the filter, oops, allows us to take our data and select a subset of data. So this is just a little bit of setup I've done here using my other queries. So this list here is a unique field of occupations as we saw before. And here I've just set up a little drop down list using the data, data validation. If I go to that, you can see, yes, I wanted the list and you can see what's been put in here. It's the address of that first cell, the A-2070. And then at the end, if you put a hash sign, that means it's not just that cell, it is the dynamic array associated with that cell. So again, I don't have to know how long that list is. I just have to know where it starts and put a hash at the end of the line. Okay, so the effect of that is I get this little drop down list. So on my little filters, what we're doing here, we're gonna filter on the table filter, I should point you down, is the name of this table, the whole table, it's called table filter. Table unique was in the other, in the unique worksheet. So table filter is this one here. And then I'm gonna filter the table filter based on the values in the occupation column, having the value of clerical, okay? So if I copy that, get rid of the little hash. Oh, I've got a spill because of course, I've got other things in the way now. So I'm thinking this controls the, just bringing this down here at the bottom. You see, as soon as I did the control V, you'd see up here, this is the expression, and this is the data that's being returned. So it's all of the table, and you can see, I say all of the table, all the ones where the occupation is clerical down there. I can also do it based on the value I have in here. Now at the moment that is still saying clerical, if I paste that in now, and do a couple of things. Now, this is where I was before, except now I'm using the cell value. So if I go back to the top here and pick another one, like skills manual, and go down, you can see now this has all changed the skill manual. So I can use the drop down in order to control what today is I'm looking for. And just one final example here, let me just get rid of that one first. This one here, you don't have to return all of the columns from the table. So here, what I've got is, for what I'm filtering on, I've just got a range of the columns from first name to last name, and then I've said I want you to find me the ones where the middle name is Y, okay? So there you can see there's only two examples with the middle name of Y. So you don't have to return the entire width of the table. You can have just a set of consecutive columns. Okay, I think I'll stop there. There are a few other examples to be found at the links which I've got here. So if one of them all dynamic arrays, link there, power pivot, there's them and many, many more.