 Hello and welcome to the session in which we would look at the concept of big data. This topic is covered on the CPA exam, covered also on the CMA exam, and if you are taking a data analytics course, this topic is becoming a frontline issue, whether on the CPA exam or in the real world. So you need to be familiar with what is big data in this session. I will talk about the big data, the characteristic of the big data. In the next session, we would look, we would look more at data analytics. How do we analyze the data to obtain information and what type of tests we can do to obtain that additional relevant information for business decision. Now, whether you are studying for your CPA exam or CMA exam, I strongly suggest you visit my website, farhatlectures.com. I don't replace your CPA review course. I can be a useful addition to your CPA review course. I can explain the material differently. And by doing so, I can add 10 to 15 points to your CPA exam. I help many students using my system pass the exam. Your risk is one month of subscription. See, by using my system along your course, your knowledge will increase. If it works for you, you keep it and it's going to help you pass. If it doesn't work, you cancel your risk was one month of subscription. Your potential gain, though, is passing the exam. Also, if not for anything, take a look at my website to find out how well or not well your university doing on the CPA exam. No, I do have other courses, accounting courses, including data and lyrics that you could check out on my website. Please connect with me on LinkedIn. If you haven't done so and only then you can take a look at my reviews at people who already use my system to pass the exam along the CPA review course. Please like this recording on YouTube, share it, connect with me on LinkedIn and Instagram. So what is big data? Well, it's a term that's it's constantly changing. But basically it describe any large volume amount of data that can be mined for potential information. Now, what is data? Data is numbers and figures and images and all sorts of things. Now, some data is structured. We're going to look at three types of data, structure, unstructured and semi-structured. So we need to know the difference between those because you might be asked to identify whether this data is a structured or unstructured data. Structured data is the traditional data, the old data. We're looking at highly organized and to pre-defined groupings like rows and columns. And what can you think of here? Hopefully you are thinking about an Excel sheet, right? If you have an Excel sheet, you have rows and columns and the data is sitting there real nice or if you have a relational database, it's the same thing. Those are what are called structured data and the data that can be easily sorted and searched by computer program. For example, for Excel, you can easily sort things from the highest number to the lowest number, compute the averages, really quick and run regression, really quick on that data. An example of structured data will be quarterly sales data by product. That's fine. On the other extreme, we have unstructured data. Unstructured data, obviously, it does not have any pre-defined or little pre-defined organizational structure, so it doesn't come clean to us. In this data, there's a lot of unstructured data and it's going to make it more difficult to analyze because it's unstructured. It could be very valuable, but difficult to more difficult for computer program to sort, search or analyze. What are we talking about here when we take unstructured data? Social media, social media, for example, text on social media to analyze the text, whether people, whether customers are liking this ad or not liking this ad, liking this product or not liking this product. Well, they're going to write. They're going to write reviews, so you have to analyze those reviews. It's different than analyzing number. Oreos, videos, images, images, you're analyzing images. Now, we have software that analyze images, but again, it's not as easy as having data sitting in rows and columns neatly. OK, so they're more difficult for computer program to analyze. Not impossible, just a little bit more difficult. They may need a little bit of cleaning or they may need special handling. Then we have in between something called semi-structured data. So structured and unstructured are the two extreme. Semi-structured, obviously, is someplace in the middle. It's highly organized, but still have some identifying information that can be used by organization, by computer program. Example of semi-structured data is SCAMA, delimited files or CVSCSV files. Those files, when you download them from the computer and most of the time, when you download data, for example, from the Department of Labor or from any other governmental source, usually, it's in comma, delimited files. It's not easy, but you can turn this file into an Excel file. So it's not structured, semi-structured, but you can make it a structured file through some programming. Now, some files will need some actual programming to be written in them. Others are not as bad. Also, we have something called XML and XBRL. XML is Extensible Markup Language and XBRL is more formal. It's Extensible Business Reporting Language. In those language, you would learn more about them in your FAR exam. It's how companies report data, which is financial figures to the SEC. So they're not totally structured, they're semi-structured. So they can send the sales figure. So it's not structured like in an Excel sheet, but you can download it and turn it into either an Excel sheet or put it in a relational database. So the information collected, the point of this is to let you know the point of big data is to take this information that doesn't mean anything, low density data, and convert it into high density data. So that's the whole purpose of the big data. We might have a lot of data, but if we don't use the data to give us value, then the data is not really worth anything to us. Now, one way to use the data is through something called BI or Business Intelligence. So this is when we take the data, we process the data with analytical and algorithmic tools to reveal meaningful information. Now, this picture here is not really business intelligence, but it's a form of it. So once you take this information and you process it, you're going to present it in a data visualization using a tableau or Power BI. Why? Because it's easier now to make a decision once it's presented to business decision making. For example, here, I give you this example. This is revenue by channel for some company and it's showing them showing them by month, by period, how are they making the revenue? Through what? How are they making the revenue? Through what source? For example, here we can see that emails, emails here, emails, seems it's it account for a large portion that comes direct direct sales. Then we have CPC, Google, organic search. So it tells you where is your revenue is coming from. So this way you can focus. So if the email is working me, you want to maybe focus more on the email. So you can grow this, maybe organic searches. Maybe once we know what's going on with organic searches, we may change our SEOs and see what happens. So this data is helpful for us because you can visually see what's going on. Again, this is only I'm showing you the visual aspect. I'm not saying this is analytical or algorithm here, but the visual aspect of it. So all we're doing is once it's presented to us, we're looking for trend, pattern, any business insight, OK? Now, but big data, it has certain characteristics. There are four V's for the big data and you need to know this for the exam because you might have to answer multiple choice questions. And those are volume, variety, velocity and veracity. The first thing is volume. So just just to give you an idea of what is a data. So this way you could just imagine what the data is. We have bytes and if eight bytes will give you a single character, a single character is just a character like the letter B, the number six, the number seven, those are one single character. So you need to know the terminology. If we take 1000 bytes, it will give us a kilobyte. This is basically a compressed document image page. So if we have an image page, it might be 50 kilobyte, which is 1000 bytes. Now, megabyte is 1000 kilobyte. Well, what's 1000 kilobyte? For example, 20 megabyte is a box of floppy disks. If you're maybe you never in your life, depending on your age, maybe you never in your life looked at floppy disk before. But if you remember the floppy disks, it's 20 a box of a box of floppy disk. Gigabyte, you know, Gigabyte always impresses me because when I bought my first computer, my first computer has a hardware that can contain four gigabytes. And I thought that was a lot, a lot, which is four gigabytes. A gigabyte is 1000 megabyte. OK, a gigabyte is 4000 megabyte. So my first computer had four gigabyte. That was a huge and now my phone that's in my hand right now. It has 128 gigabyte. OK, then we can go from gigabyte to terabyte. A terabyte is 1000 gigabyte. OK, for example, if you have a Google Drive, you pay, I believe, additional twenty dollars you can store to up to 1000 gigabyte on that, which is one terabyte is 1000 gigabyte. OK, then you can go from terabyte to perabyte, which is one million gigabyte, which is equivalent to 3.5 years of Netflix streaming videos, and you can go Xabyte, which is one billion gigabyte. I like to put everything in form of gigabyte because I can relate to that. So you can just go with the go with that. So just make sure you are familiar with the sizes, because you might have you might have to answer some questions. So the data has we have an extreme amount of data capture over time and in real time, because the data is constantly being captured. It's it's stored on farm servers, you know, sometimes it's your server, sometimes it's in cloud. Now, most of the data is collected from the Internet of Things. This is the top source of the data that we have to deal with in the real world. It can be captured from wearable devices and that's human or animal. For example, for example, a farm, they might they might tag the cows or they might tag the sheep's or any type of animal for the purpose of collecting data. Digital machines, e-commerce, transaction, buying and selling, basically the Internet or the Internet of Things, anything that's connected. Any network that's connected, sometimes it's connected in real life. All this data that it's capturing, it's being stored. Now, generally speaking, generally speaking, the more data we have, the better off we are as long as we are using it and a good decision in acquiring and retaining clients. So the key is to take all this volume and use it to your advantage. But the big data, one characteristic of it is it's it's a volume. It's a lot of data. Now, that's good. The second characteristic of of of data is velocity speed. Now, you might have a lot of volume, but if you don't process it quickly, process it using the right decision, then it's useless. So velocity deals with how fast we are collecting and processing the data. Think of the GPS. If you are using a GPS, it has to collect the information where your car is and give you information, real time information, processing all the traffic information so you can make instantaneous decision. Also, when you're when you're using your GPS, advertisers, they have to know your location. If you're close to them so they can advertise, maybe the GPS is listening to you. You know, if you're talking about food, it might start to give you ads about places where to stop for food, right? So it's instantaneously taking this data, which is voice converting the voice into some business decision that the advertiser would like to have about you. Think about down the road, autonomous cars. Well, the data, there's going to be a lot of points of data that the car will have to take care of and that data has to be processed instantaneously for sure to save lives. And if that's the case, you guys know that we're going to have to have multiple servers with the strong computing powers to process all this information. Now, for small companies, what they do, rather than having their own servers, they might use cloud computing, which we'll talk about cloud computing later on in a separate session because you need to know this for your CPA exam. So simply put, the faster the business can process the data, the sooner they can offer customized product for their customer. So that's the importance of the velocity of the data. If I know the person in that car is looking for food, boom, I want to give them that advertisement immediately, the faster, the better. The two other V's are variety and veracity of the data. The data comes in wide variety of files. And we talked earlier on about structure and not structure. Again, we have structured files. Again, it's defined by rows and columns. It can be easily searched by SQL structured query language. And we have unstructured file. And the unstructured file is the result of advancement and technology. For example, if you were a wearable device, it's collecting data about your heart rate, it's collecting data on how many steps you want, how many miles you ran, so on and so forth. We have images, text, audio, easy pass, so on and so forth. Also, we can either buy satellite images. For example, if we want to buy a business, maybe we want to know, you know, it's a physical business. We maybe want to know how many cars are visiting the business. We can buy satellite images throughout the day to capture how many cars are stopping there and we can basically make our analysis based on the traffic. We can have an average sale per customer. We can, you know, based on the data. Or we can use this data to advertise for advertising purchases. Remember the satellite images, you can, this is advertisement. You can buy also from a third party like Google. The veracity of the data deals with its truthfulness. Can we rely on this data? OK, is it trustworthy? Sometimes the data is not not trustworthy, but sometimes it needs some cleaning because there is some discrepancy in the data. So this is what veracity of the data is. So you might be asked some multiple choice questions to determine whether this source of data deals with it, with its velocity, with its volume, with its variety or veracity, make sure you understand the difference. Now, all in all, big data, it has benefits and it has challenges. What is the benefit for big data? Well, it's it increased the efficiency within the within the society. Think about the traffic. Think about now, when you use your GPS, it's given you traffic in real time. It's helping you, you know, use different route to get your work early. Therefore, it saves you time. And believe it or not, because of the big data and the velocity and the speed of the data that we can obtain, that's one of the reasons. Again, this is debatable. That's why we have low inflation, because everybody getting the most of everything. In other words, you are using less resources to obtain more goods because the information is available to you. Now you can go on Amazon and you don't have to buy premium prices. You can shop online. All the information is there based on big data. You can collect the cheapest. Therefore, businesses, they cannot raise their prices too much. Keep an inflation down. That could be part of it. OK, it also helped create customized experience for clients. You have all this data, you use it to create really nice product for the client based on their preferences, because you are monitoring them. You are seeing where they're clicking on their website. You are looking, you are seeing what grabbed their interest. OK, so now you can also monitor your product throughout the supply chain. And again, this increased the efficiency in doing business. So if there's any place where the material is sitting for too long, you may want to cut, cut, cut that, cut that route and maybe choose an alternative route because the product is taken longer to arrive to the customer. Therefore, you can save money. But there are some challenges when you have big data, then you have to worry about the privacy and in the US, we have something called HIPAA. It basically it deals with the privacy. In other words, the health care providers, they have to maintain your privacy. They have to make sure your data is protected. OK, so that's a challenge for the health care providers. Why? Because the information, it doesn't have to be health care providers because health care providers have more liability, but any data is subject to theft on a large scale because the data is sitting in one place. So once that database is is broken into, you're going to be able to steal to steal everything, names, addresses, social security, millions and millions of data. It happened several times, especially when it happens with one of the credit agencies, they were able to steal. Who knows how many numbers of social securities. So this what's going to happen because you have big data, you're going to have to have protection for that data and that require encryption, that require additional cost. Again, this is part of the challenge of big data. But again, we have to wait the benefit or the cost. And I believe the benefit is greater than the cost, except of how, you know, your opinion about privacy. Again, at the end of this recording, I would like to invite you to visit my website, farhatlectures.com. What I do is I give you alternative explanation and supplemental material. I have alternative multiple choice to help you learn the material. I don't compete. I don't compete with your CPA review course. I don't want to take away your CPA review course. I can't do that, but I can supplement your studies to help you pass. Think of me as an alternative, as a backup for your CPA review course. Take a look at it. Study hard. Good luck. And in the next session, we'll look at data analytics. We're going to take this data and how can we use it for business decisions? Good luck. Study hard and stay safe.