 What's going on everybody? Welcome back to another video. Today I'm going to tell you my favorite ways to use ChatGPT for data analysis and then we're going to actually analyze some data. Now ChatGPT has taken the world by storm. It is pretty stinking impressive. I've been using myself and I've been pretty blown away with some of the results. If you have not tried it yet, I highly recommend trying it yourself. What we're going to be doing today is I'm going to walk you through four of the top things that I've been using ChatGPT for for actual data analysis and then we're going to take a small data set and we're just going to ask it some questions and we're going to prompt it to do a few things and see how it responds. I'm going to tell you right at the beginning it's pretty incredible and this is just ChatGPT 3.5. In future variations and future iterations when they make improvements, I probably will be making more videos on a lot of the new functionalities and the enhanced abilities that ChatGPT has. So let's take a look at my favorite ways that I've been using ChatGPT as a data analyst. All right so I've got my computer in front of me. If you have not seen this before, if you don't know what this is, this is the interface for using ChatGPT. So on this left hand side, we have these discussions and these are previous ones that I've been doing just testing things out. I've gotten rid of a lot of them because I've done this a lot and this is the actual interface. Now this is what it shows when it's just prompting you, hey here's some ideas that you can do and then down here is where you're going to actually write what you're doing. So we're going to be feeding ChatGPT something called a prompt. Basically telling it or asking it to do something. So one of the first use cases for ChatGPT is that it can explain code really well. So I'm going to ask it to explain some Python code and we'll see what it says. So let's go ahead and say explain this Python code. I'm going to come right up here. This is just my Jupyter notebooks. I'm literally just going to take this. This is from a project that I created, just a unit of measurement converter and I haven't given it any information besides that. It's literally just the code and it's going to have to take the string and the code that I've actually written and try to explain it to me. So all we're going to do is we're going to click enter and right now I'm working with the completely free version. So everything that it's about to generate is based off the free version. I'm not paying for any of this. I don't really believe in paying but let's actually take a look at what it's explaining. So it's I'm not going to read all of it but it says this code is a simple unit converter, converting between inches, feet and yards. It says the user is prompted to enter the unit of measurement to convert and the code converts the user's input. It says the code uses lower function to convert the user's input to lowercase so the inputs are case insensitive. And that's 100% right. If we go back to the code it's literally asking the user to input some unit of measurement, what they want it to convert it to and then this logic just converts it for it. It's fairly simple, may be a lot of code if you don't know Python, but it's actually fairly simple and chat GBT does an extraordinary job just explaining it really in layman terms. Now I've been using chat GBT to explain code to me because some people are sending me code and I just don't fully understand it 100%. I'm like okay, I get the gist of it. But when I ask it to explain it sometimes it'll go in even more detail and it's really, really useful, especially if you're first starting out and you're not fully understanding some code or a script that you're getting or trying to write. Now that's just Python code. You could use it for SQL or for R for any other programming language or even something like a Excel function. It really could be whatever you want explained. You just put it in here and it'll explain it to you. Now the next thing that I can do is actually generate code. So I'm going to ask it to generate some Python code as well as some SQL code and we'll see what it gives us. I'm going to come right down here to the prompt and I'm going to say write a Python script to scrape data, oops, data from Twitter without my IP address being blocked. And let's see what it comes up with. So the generating code is actually really amazing because oftentimes it'll give you somewhat boilerplate options or boilerplate responses. But it also gives some context here because what it's explaining right now is that you really shouldn't do that or here's some reasons why you may not want to do this. And now it's going to actually generate the text. Now it even has up here a copy code. You can just come in here, copy this code and you're good to go. Right here you can see your consumer key, your consumer secret. This is using the Twitter API. So it's saying you need to use the Twitter API but I could say not using the Twitter API and it would generate an entire other response without using the Twitter API. So it's pretty incredible and it's still going. This is a lot of information. And if you don't use any APIs, if you're not used to using things like that, if you don't know Python at all, this is fairly straightforward. It's fairly simple. This isn't, I'm going to scroll up real quickly. This isn't anything like super advanced. This is something that I could probably Google and get pretty quickly. But it's generating it based off of my request that I don't want my IP address to be blocked. So it's giving me some options to actually do that, which I think is just fantastic. That's really, really impressive. And it even explains it at the bottom how it helps you avoid this IP address being blocked from Twitter. So again, just pretty impressive stuff. But let's go ahead and try it with SQL code. So we're going to come down here and say write a MySQL stored procedure that automatically imports data from a CSV at a specific path. And let's see how we write this. Now, I've worked a lot with this stuff with data collection, importing data automatically. When a CSV gets dropped into a file, you want to automatically take that data into the database. Super, super common. So let's see how it writes this. Now, I just got a network error. Basically, chat GBT is overwhelmed and it couldn't finish my response. This is happening a lot with chat GBT. I just want to kind of explain this. They now have a paid version that you can pay for. It's like 20 bucks a month or the more advanced version, which is $42 a month. This won't happen if you pay for it. But if you don't, which most people don't want to do, you're going to encounter these types of errors. So what I'm just going to do is I'm going to refresh. It even says we're experiencing exceptionally high demand. That's completely true. But I don't want to pay for it. So I'm going to show you and hopefully it'll generate the entire thing without me having to pay for this because I really don't want to. Let's see what it's writing. So it's creating this stored procedure and it's using this import CSV. It's giving our file path and telling it to put it in this, in this table right here. And so it's literally looking for a CSV and then it's going to actually write the code to load that CSV. So that's what it's doing right here. And you could use this, you know, you have to have certain settings in my SQL. Don't just copy this and paste it. You have to input some information. You have to have certain settings set up in my SQL as well. But then it just kind of explains it a little bit for you. And it's pretty great. I've used some variation of this. And each time I run this, it gives me a different version. Sometimes it's more dynamic. Sometimes it's more hard-coded. So actually knowing this skill when it's generating code, actually knowing SQL is really important. I wanted to warn some people out there because a lot of people are like, okay, you know, I'm just learning SQL. I should just use this to learn SQL. And I'm like, I don't think that's a good idea because if you do not know this code already and it tells you to do something, you may get a worse option or a not perfect option and you need to know it. So just looking at this, this is pretty good. This is pretty similar to something I've worked with. But in previous versions, it's given me different options that I didn't like as much. So be careful with this one when it's actually generating the code. Oftentimes, you'll need to tweak it or change it. I have found that chat GBT is pretty good, but sometimes it just gets things wrong or it just gives me bad options that I'm like, this is not good code or it's not perfect code. And I have to tweak it quite a bit to get it to what I want it to do. Now the next thing that I really like chat GBT for is writing comments for my code. So let's take this for example. Let's say I have this code. I'm going to say, can you please, I like to be polite, can you please write comments for this SQL code? And I'm just going to paste it in there. And I'm going to ask it to write comments for it. If you haven't written any code, you may not know what that means, but comments are just, you know, added information for somebody. If they're coming in and looking at your code, or for you in six months to come back and look at your code, you know exactly what it's doing. So it's going to describe every part of this code for you, which can save you a ton of time. If you're commenting your own code, that can take a while. And you can just go in here and add a little bit more information if you want to. But this is like a fantastic starting place for commenting code, which I personally sometimes forget. And so I've been using this for even more complex code, like this is fairly simple code to comment. I've given it even more complex code and it's done a pretty good job. There are times where it gets it kind of wrong. But this is pretty impressive for actually generating comments for code. It's saying exactly what each step is doing. The next use case that I've been using chat GBT for is for creating data dictionaries. Now, if you don't know what a data dictionary is, it's basically a table or something that describes what data is in your data set, whether it's in a CSV or a SQL server or wherever, wherever that data sits, it's good to have that data dictionary to know exactly what data you have. So I'm going to say, can you give me a data dictionary for this data set? Now I'm just going to paste this in here. I just copy this from an Excel. So that's all I did. And I'm going to hit enter. Now it's starting to generate this. It's going to actually give me a table. So it's creating this table of the column name that's in there, the type of data that is in there, and then an actual description of it. Now I haven't given it any previous information. It's doing this purely based off of context. So it's taking this name state and saying, okay, this description must be the state name is doing total population. So that must be the total population. Then it's even taking in things like comparison operators and it's saying these rates dot age is saying a rate for age less than 18. Now this is fairly, fairly simple. I'm not saying it's giving you anything advanced. It's not giving you a ton of information beyond what it's reading in in the column name, as well as reading in in the data. So if you give it a more complex data set, sometimes it even takes the data that's in there and it says, okay, these are state names, but they're labeled something different. It's labeled just 123 for the column name. So then it will tell you this 123 column is state name. So it even takes that into consideration. It's pretty wild. I mean, it is really, really impressive. Now again, I'm just giving you like one or two examples of each of these. Otherwise, this video would be like an hour long. But the next thing that I've been using chat GBT for is optimizing my code. Now let's go right over here. This is an actual SQL query that I have written. I'm going to copy this. It's just doing what I would consider a fairly, let's take a look at it, a fairly simple window function as a sub query. And then it's using a case statement to kind of use that information. Let's see how it would optimize this code. Can you please optimize this MySQL code? And I'm going to give it the code. Now what it's going to do is literally take it in and rewrite it to properly run it faster to actually optimize that code. Now if you don't know what query optimization is, it's basically where you're either changing a query or you're changing the database to run faster. So what it's actually doing here is creating an index. This is a way that you can optimize code. Although for this code, that's not exactly what I was wanting. I wanted it to optimize the query and not create an index, which does speed it up. That is a fact, but I'm going to now say, I'm just going to say, can you optimize this in just the query itself? So it's going to remember the previous thing that I asked it to do. And now it's going to understand by context what I'm asking it because I don't have to actually input my query again. It's going to know let's see exactly what happens. So I've tried this twice and it gave me different answers each time. Last time it gave me what I hope it's going to give me now. That time it gave me an index, which it didn't give me before. So it gives you different things. It's just really, really fascinating how this works. Now what is giving me right here is the actual output that I was hoping it'd give you. Although this is a learning opportunity. That's just how chat GPT works. This right here is what I thought it would give you instead of the index. But what it did here is instead of in this part of the code having this select everything, what it is now doing is just selecting the actual columns that we need. So now it's just selecting employee ID department and salary instead of all of the columns, as well as this window function right here. Now it's going to even tell you why it does that or how it does that. So by using a subquery to only retrieve the columns that are actually needed in the final result, you reduce the amount of data that is needed to be processed, which speeds up the query. So those are all ways that I've been using chat GPT. It's been pretty amazing. I will give the disclaimer though that not every time do I get great results. There are oftentimes where I have to reword things or I have to ask it to generate another response. And that happens a lot. But when I actually do get what I'm looking for, it's pretty impressive. Like I usually am like, wow, how did it do that? It's pretty cool. Now what I want to do is just take a small data set and do some data analysis kind of show you some of the things that it can do. Some of the ways that I would use it. I'm just going to pull a data set from Kaggle and use that. Keep it pretty simple, not use any real data with any information I shouldn't be putting into chat GPT on it. So let's go down here. And in this Excel file, and will you be using this one later in this Excel file have this data set right here. Now this is going to run really slowly because this is a lot of data. But I'm just and I say a lot of data. It's only 30 rows. But chat GPT does not take in this data well. So let's get rid of this. And all I'm going to say is you are a data analyst. So I'm giving it some context. That's who you are. That's what you do. Here's what I want you to do. Can you please give me some insights and recommendations on this data set. And I'm going to input this data in here. Now when I hit enter, it's going to analyze this really quickly. Now if you ask it to do other things which we're going to do in just a little bit, it may take quite a while. So I want to give it a second to actually write everything. Basically what it's saying is to actually provide insights and recommendations. We want to take some initial steps. And these are some pretty common things that you'll want to do with data, like data cleaning, exploratory data analysis, product analysis, geographical analysis and time series analysis. So let's take a look at this data really quickly. We do have a product. We do have an order ID, the actual product, how many we have, the price, the order date and where it was purchased from. So what it can say, based off of just giving it the data set randomly, it says you can offer discounts or promotions on the least frequently ordered products to increase their sales. Now what we could do is literally take this and say, can you tell us more about this? And I'm just going to ask it to explain it a little more. So what it told us is you can offer discounts and here's why you'd want to do it because you'll attract more customers. And it even tells us how to identify these products. And I'm going to ask it, can you identify these products for us? And I'll say and output the SQL code as well. So if we go back up to generating the code, it says to identify the least frequently ordered products, you can use the following SQL code and it literally writes the code out for you with the actual column names that it says you should be using. That's the part that always kind of like, I just think is really impressive because this kind of code is actually fairly simple, fairly generalized, but it makes it not so generalized because it's actually using your data. That's the part that always like blows my mind. I'm like, that's just really impressive. That's using your actual column names or your actual code. Or generating your code using your column names. That is the part that's usually pretty impressive to me. So all this is doing, it even says we're using the product count. So we're using an aggregate function in the orders table and counts the numbers of times each product appears and selects the top 10 least frequently ordered products. Again, pretty simple, but we've asked it to do a lot of contextual things. Take a look at this data set, tell us how to offer discounts and actually write the code to identify what we should be offering discounts on. It's pretty, pretty cool. Now let's say we didn't want to use SQL because that's what we're using right now. I'm going to say, can you take our data set and put it into a pandas data frame in Python? So I'm going to see if it remembers. Sometimes it does, sometimes it doesn't. Looks like this time it does. Depending on how far back the conversation is or if it remembers what our data set is, sometimes it remembers, sometimes it doesn't. This time it looks like it is working. Now what it's actually doing is creating each row of that data and inputting it as data. Now sometimes it's going to error out. It takes too much processing power and we have like 30 rows. It's only on row three. So I'm going to let this run for a little bit. We'll see if it actually works. So it's looking like it aired out. It just stopped. It just stopped. This is the, I would say, somewhat downfall of the chat GPT right now is that the servers are just overloaded. There's like millions and millions of people trying to use it at this very moment. So oftentimes when it's using these large data sets, it just doesn't work. In the future, I think there'll be add-ons to things like Excel and MySQL and all these different things where it'll be integrated with it and process it much faster. But now it doesn't. So we're just going to refresh and keep going from here. And we're just going to go right back up here. And this is our previous data set and we'll scroll all the way to the bottom. So it saved our conversation. Thank goodness. But it just, you know, it just blanked out and luckily it saved us. That was a good thing. Now, because I refreshed it, let's see if it still remembers. Again, we're trying to analyze data. We're trying to get some information out of here. But, you know, these things are happening. So I'm just kind of going with the flow. So what I'm going to say is what products in our data set made us the most profit? Let's see if it remembers. So now it's going to say to identify these products, you need to calculate the total revenue. So this part is really cool because it's going to kind of, based off of our prompts, based off of us saying we want profit, it's going to say we need to calculate the profit first to determine the profit. So it's going to write, it's using pandas, but we can ask it to write this in MySQL. But it's going to take it and say, okay, take the quantity ordered times the price of each one. And this is our like potential profit. And it's even giving us comments. I mean, it's really going above and beyond. I mean, if you were to integrate this in something like Python or, you know, any other thing, this could be a fantastic tool. But this is pretty great. This looks like pretty standard code. It's even grouping it and giving us the sum. So if we ran this, if we took this and put it in Python, and we had our data set in there, then it would probably give us pretty close to the right answer. Now I'm going to ask it to write this in MySQL as well. So I'm going to say, can you write this in MySQL? And there we go. So it wrote it in MySQL as well. And this looks pretty simple and very straightforward. It looks correct to me. Now one thing I want to say is we're analyzing this data. We're kind of prompting it. I'm just giving some examples, right? We're not going to do like a full analysis that I could do that in a whole another video. But what we're doing is just I'm giving examples of how you can actually ask it to analyze this data for you. So what we're going to do next is we're going to ask it to categorize some information for us. So I'm going to say, can you categorize, let me say the products that made us a lot of money versus a little bit of money in MySQL. And do MySQL. So we're asking it now to categorize this data for us. If it made us a lot of money, we're going to want it to say it made us a lot of money. If it's a little money, we want to tell us it made us a little bit of money. Let's see how it does this. And again, this is all pretty generalized questions, kind of open-ended questions. We're going to see how it actually interprets that open-ended question. And so it's literally writing a case statement, saying if it's over a thousand, it's a lot of money. If it's less than 500, it's a little bit of money. So this is something that it's really interesting. This is not something that I'd ever write in a real SQL query because it's leaving that 500 to a thousand somewhere in there and it's saying it's somewhere in between. I would never actually do this in real code, but that's how it's interpreting it and writing it. So this would be a mistake to me. This is just like a small thing that I would go back and change and customize. But we could give it more specific prompts. We could say, if they made us a profit of more than a thousand dollars, give it this. If it was less profit than this number, give it less. So we could even give it more specific and it would do it. Now, if you remember from our Excel right here, we have this a column. And what we want to do is break it out by the street, the city and the state as well as zip code. So I'm going to ask it to do that because this is something that I've actually done a ton in SQL Python and any programming language is breaking it out so you can group on it and clean up that data better because in its current state, it's not very usable. So I'm going to say, can you break out the last column into street, city, state and zip code columns and we'll say in my SQL. Now, you can do the exact same thing for Python or whatever, but let's see how it writes this. Now, I posted this on Instagram the other day and before it was using substring index. Let's see if that's how it does it again. And it looks like it is. And this is almost exactly how I would write this. So this one really impressed me because I'm writing this code to test this video or make this video. This one really impressed me because substring index, using substring and locates like this, it's not super straightforward. It's not super easy, but it does it. Just very, very impressive in my opinion. Very impressive how it's able to write this. And this code, if I'm just glancing at this, this code looks correct. Like I could run this and it would take that column and break it out into those four columns for us. Now what we would want to do, because this is just in a select statement, but I could literally, and let's try it, I could literally just tell it, can you create new columns for each of those. So again, I'm being vague each of those. I'm just trying to test it a little bit while we're in this video so you can see how impressive this is. So now it's going to create a new column, city, state, street, and zip code. And then it's going to tell us how we can do that. I don't know. It's just really, really cool. And now it's literally writing the code to use that substring index, which we generated up here, which looked correct. And now it's telling us in these, you know, update statements, how to actually apply that to those columns that we created with this code. All I have to do is copy and paste this. Again, it's blowing me away a little bit. Every time I run it, it's just kind of like, how does that work? Like on the back end, what code are they writing, or what code are they using to, to generate this? It's just really impressive. Now you can ask it to convert it to a different programming language or whatever you want, and it would do it. But what we're going to do now is do something a little bit different. I'm going to ask it, and I'm going to actually, I forgot to create this beforehand. I'm just going to say first underscore name. And what I'm going to do is I'm going to give it all types of different inputs. So I'm going to say, Sam, then I'm going to do Kelly, and then I'm going to do Josh 123. So I'm going to take this, I'm going to ask it to clean this data for us, just as like a small example. And you clean this new data set. And I'm going to give it to it. Just see what it does. If I need to be more specific, I will. But let's see how it takes in that data and standardize it. It even says right here, we're going to use upper to standardize the text. And to remove numbers from the characters, you can use regx replace, which is 100% accurate. That's probably what I would do if I had a large data set with all data like this in there, I would be using those things to clean this up. So this code right here is going to take regx replace and replace all those numbers and just keep the letters basically is actually what that's doing. And that's going to put it all upper. So I don't want an uppercase. I'm going to say I don't want it in uppercase. I want proper case. Could you please write that code as well? And I'm going to see what it does. I hope it makes it into more proper case because I don't like using it in uppercase. Just not always what I like visually seeing. Let's see what it does. So it's basically going to use init cap and make it a more proper case instead of doing it all on upper or lower or something like that. And now what I'm going to do is I'm going to say, can you put that cleaned data in a table for me? And it's going to actually generate this and create a table for you to copy and paste. And it's actually creating the table. That's not what I wanted. If I'm being honest, I wanted it to create just a list for me to copy and paste. So I'm going to say, while it's generating it, I'm going to say, can you put this in a list so I can copy and paste this into Excel? So let's see if it does better this time because I didn't want it to create the table in SQL, but that's the context that it was understanding. And now it's doing it in Python. Again, not exactly what I wanted, but it's doing something, right? It's definitely working hard to try to understand what I'm trying to tell it. I just wanted to literally give me like something I can copy and paste and it's not doing that exactly. I could just take this if I wanted to and break it out, but it's not doing exactly what I wanted, although that happens with chat GBT. The very last thing I'm going to do, and I'm just going to go back here. So I have the second data set and it's a lot of data. So I'm just going to take like the first two columns. And what I'm going to do is just ask it to basically explain what this data set is. I'm just going to say, can you please explain what is in this data set? Now I'm being specifically vague for a reason because you can be much more specific. You can say, explain to me what these columns are and how they correlate to each other. You can explain, ask it to explain almost anything, but it's basically going to give us some context, some overview of what kind of data is in here, much like a data dictionary would. It's kind of doing the same thing. Now this is going to take a long time. There's a lot of columns. There's like, like 30 or 40, but it's doing its thing. It's going to keep going until it explains all of them unless it errors out because it's just the system's overwhelmed. But while this is running, these are some ways that you can actually analyze data. I'm just giving you a lot of examples because I want you to go and test this out. It's, I don't know. There's no other word to use other than just, it's really impressive. I'll have an entire other video that I'm going to be making on this, which I'm basically going to be talking about how it'll help with data analysis, the data analyst job market, how it'll impact it in the future, and some predictions that I have about its capabilities and things like that. There's just a lot going on in this world. Google is about to come out. I'm just kind of giving some filler for my next video on this, but Google is coming out with its own version. I believe it's called Sparrow. It's using their Lambda model that they've built. And I'm super excited for that one. So I probably will do some comparisons between those as well. This type of technology is just, it's going to change things. I don't know how exactly I have some predictions. I have some, you know, information that I'll be providing. This is going to change things. This is just incredible. And it can analyze data like this on these small data sets, but when it's incorporated in things like Azure, like with Microsoft, which is, you know, doing some big things, if it's incorporated with Azure and it can analyze these huge massive data sets a lot faster, a lot of things could change. So in this video, I kind of showed you a lot of my favorite things that I've been using chat GBT for. I've showed you some of the functionalities as well. And then we even analyze some data, just kind of poking around and asking it questions and seeing how it interpreted those questions as well as asking it to write it in Python and SQL. So, you know, I just gave you a few different variations and flavors. It wasn't like a full analysis. This is just like an example. In another video, I might take an entire real data set and see if I can get chat GBT to analyze it well, but not in this video. So I hope that this is helpful. And I hope that this was exciting for you. This stuff is super, super exciting to me. I find it just incredibly interesting. And I think it's just really impressive. So go try it out yourself. Go see if you like it. I will be making more videos on stuff like this because I personally think it's just incredible. So with that being said, thank you for watching this video. I hope you learned something. I hope this peaked your curiosity. Be sure to like and subscribe and I'll see you in the next video.