 Welcome to the Tech Track of School of Style. So I can see that many of you have a familiar face for me, but for those who are the first time here, the School of Style is the three-week series of workshop, which is called the Business, Design, and Fact topic. So we have the last week, the last two weeks with another topic, so this week we will focus on the fact, which is to provide you more hands-on and feel relating to technology and help you to play a view of your business and startup career. And the School of Style is host here in the Shocked Lab, and we are a non-profit organization under the Style of Founder, and we are a sister organization of slugs, junctions, wave venue, Brings in North, and Mariah's little one. So I won't take most time of you, but let's have few logistic things. We have two restrooms available here, one right in the left, the first door, and another one is down on the home in the corner. And we appreciate a lot if you can feel our feedback form in the end and give it to us. Please give to those who have the short-term t-shirt, and if you have any ask for help, you can ask us as well. And we will have the live stream today and have their pictures that anyone don't want to appear on the picture. So no, I see that everyone willing to be promoted. Okay, so welcome our speaker today, Jessica from Chaos Architects. Hi. Alright, so here's the long description of the workshop because I realized on the schedule it was condensed down to one tiny little sentence and I didn't feel like it completely captured everything I was going to talk about. So if this is not what you are on board for tonight, bye. Have a good evening. But yeah, so I am going to be talking about, I have data. Now what do we do with it? So specifically looking at like, okay, what do I do, but also what are the tools and places I can go to learn more about how to do this that are free on the internet. So first things first, who are you guys? So are you guys, nope, going to get way too far ahead here. No, I want to know who you guys are. So how many of you guys are business owners? Got a couple? Okay, how many of you are members of startups? How many of you have taken a statistics class at some point in your life? Even if it was like a million years ago. Okay, thank God. Okay, we have some common ground. So how many of you work with data on a daily basis? How many of you work with data maybe every now and then? Anybody never work with data ever? Okay, cool. So then of those people that work with data, how many of you would consider yourself just a total data analytics noob? I work with data, but I don't know what the hell I'm doing. Okay, cool. Thank you for your honesty. I'm glad you're here. Is there anyone here who just feels like they're a pro with data analytics? Okay, cool. So everyone else is kind of somewhere in the middle then. So then who am I? As you already saw this screen. My name is Jessica Patherian. I am a data scientist and the head of AI development with KS Architects. We are a data and AI company that is looking to pull meaning out of messy real world data to be able to give decision makers tools to be able to make better decisions specifically when it comes to how we build our cities. So on a personal level, I love pulling meaning out of data and I like finding the patterns and insights in messy real world chaotic data. I studied applied mathematics and statistics as an undergrad and then I worked within the field of data science for a while and then also I'm such a nerd that I just was like I need to know more and so I've continued exploring data analytics and machine learning and AI and statistics and all those fun things through open source, sources like MOOCs, massive open online courses. So what I want to give to you guys today is to like impart some of my excitement about how much I love this stuff but then also where can you go next because you're going to be so excited about this when you leave that you're just like I have to do this right now. So at least that's my goal. Alright man this is super tiny so I'll read it out loud for you guys. Well can you in the back, can you read the screen a little bit? Okay so this goes back to my communications 101 class my first semester of undergraduate and my professor was like when you do a presentation you tell them what you're going to tell them then you tell them and then at the end you tell them what you just told them so that way they remembered. So this is what I'm going to tell you. We're going to go over some buzzwords today. So we're going to be talking about those are going to cover types of data, data storage, types of analytics and some other stuff thrown in there too. Then the second section is we're going to be walking through what do you do when you have, you know you want to analyze something, how do you actually do it and so we're going to go through that from the perspective of the scientific method which you remember back to like high school maybe you probably heard that once before so we're going to go through that and use that kind of as our roadmap for how we go through an analytics project. And then at the end I'm going to give you guys some resources some of my personal favorite resources for learning more about this stuff online. So one last thing before we jump in I want you guys to take a moment to think what is a business need or question that you want to answer with analytics potentially from your business. Because a lot of you where you're like you own your own business you're part of a company that's new and starting and doing cool things or for whatever reason you decided you wanted to show up here today and learn some stuff about analytics. So think for a second right down on piece of paper on your laptop notes real quick of one case that you could use some insight from data and raise your hand when you've got it. The rest of you can kind of just let that simmer on the back burner of your head as we're kind of going through this stuff. So then the next thing as you're still thinking about that or maybe as you've got a clear picture in your head what you want out of analytics right now as we're going through this next section which is going to be buzzwords write down the most relevant keywords that you find and make sure I can actually spell this correctly yes. So the words that you see that we're going to be discussing write down ones that like catch your eyes seem interesting seem relevant to what your business need is or a question or just your curiosity and then we will have a chance to share them at the break. All right buzzwords. All right first off what is data? So we all have something like a unified definition to work from for the next two hours. Okay data is a distinct pieces of information usually formatted in a specific way. Metadata so that's your data that's your pieces of information. Metadata is the information about the data the data about the data. It gives you the context so it says maybe I've got a data set that's just like date error error message date fail. What does that mean do we know what that's actually referring to do we know what passed what failed do we know what aired do we know anything about that so that meta metadata is what tells us gives us reference for how this data can be used and what is the meaning we could potentially pull out of it. All right. So an example this is from Helsinki region info share it's actually one of the open data sources here in Helsinki. So you can go and look this up and play with this later if you want to. So we have data. Which is this portion here. We've got some column headers and we've got some rows of data in it. And then outside before you click in to download the Excel file. It gives you some of this stuff that says the title of this data set is population projection of onto by age and gender 2018 to 2045. Then it gives you a quick description this data set provides the population projection of onto until the year 2045 the statistics are available by age and gender so gives you an idea what can I actually find in this thing. So yeah just another picture of that same same thing here's an actual link up here so when you get the slides later and you want to play with it you can go there. And so here this little thing right here that's the Excel file that's the actual data but then all of this down here is the metadata about it. So we have data. It can be in a multiple tons of different formats. Absolutely crazy how many formats data can be on. But kind of the two basic buckets or categories for what data can appear in is structured or unstructured. So structured is information that's very well organized and easily searchable. This can be something like an Excel file or it can be something like a JSON data structure tree. And then we've got unstructured data and these are things that are like may just be a giant bucket that we've just dumped data into over the years. It may be something that's just has like a logical way that it's been put together like a tweet like a tweet's got your time stamp who the tweeter was and then a massive text. And so the real like a huge chunk of the information that we actually want to know potentially when we're analyzing like tweets for like sentiment analysis or like location or like hashtag references or something like that that comes from that text heavy piece and that's not as easily read and analyzed by machines. It is they definitely can do it but it's not as quick and easy for them. So unstructured data, structured data. Yeah, I've got some more kind of examples of unstructured data types. We've got text files and documents, servers, websites, application logs. So that's kind of like that date time, error message, date time, this person logged in this time. Sensor data. So things like IOT, Internet of Things. So temperature reading in your smart home that comes to your phone. The cameras on the parking garages that count how many cars are coming in. Those types of things. Images, video files, audio files, emails, and then I already mentioned tweets. So social media data. All right. Big data. This is the thing everyone talks about. Everyone wants, everyone wants to get a piece of this because it is just like the most exciting thing in this century apparently. If I can find my mouse. Yeah. Potentially. Yeah. It's defined by the project. It's defined by the project because there's things that, for instance, some of this data that you'll see on open data sources is actually summaries of the raw data. So it'll be things like the average rainfall for the year of 2018 in Helsinki was this much. The average rainfall for the next year was this much. And so those are in a way an abstraction of the raw data which could also be metadata. And sometimes the metadata is more interesting to us because we're like okay maybe what we want to know is on this Helsinki region info share how many data sets are about population. And so maybe that's the actual information that we're wanting to analyze. And it's not actually what's in the data sets in there. So welcome to the world of analytics. Everything is everything. And all the borders are fuzzy. And reality exists but it's very confusing. Exactly. Just like life. So big data. This is such a hard term to define. And as I was researching this talk and actually as I was looking at becoming a data scientist that dealt with big data years ago I was like what the heck is big data? Where is the line of where is it big and where is it not big and is there like a switch that flips that says alright you have now entered the big data zone. Or what is it? So really what it means is data of a certain size that you cannot manage or maintain or process easily. So it's more of a question of your approach and the resources that you have than the number of gigabytes or number of rows and columns. And so kind of an example of this is if you're a biologist studying the social behavior of like naked mole rats and you built this like custom deluxe habitat and you have 9 megabytes of tracking information gathered over the course of a couple months. For you with your resources as a researcher who doesn't normally deal with like huge amounts of data that can be a lot of data to deal with especially if it's very dense data. And if it's all like text files 9 megabytes of text is like that's a good chunk. But then also at the other end of the spectrum if you're Google and you're used to just gobbling up petabytes like candy consider it and you consider it fairly routine to test your new algorithms by running it on a large portion of the web. Then your definition of what really is big is going to be different. But yeah so that the line between big data and not big data is a sliding scale and you will hear conflicting definitions of it because it is a sliding scale. And it's a buzzword which means everyone has decided they have a definition for it and it's not necessarily the same as everyone else's. So another thing that I've this is one that I've like heard more recently. So I don't know if I just hadn't stumbled upon it yet or if it's kind of more on this new on the scene but this concept of thick data. So big data is just I have this huge thing and I don't know what to do with it. But it may be filled with lots of good important information or it may be a lot of information about like one thing. So thick data is data that's focused on understanding the story behind the numbers and so it's not just thick because it's like wide instead of deep it's because it's rich. It's looking at the story. It's looking also at specifically the human story behind the numbers. And so it's usually smaller data sets that are gathered through surveys, focus groups, interviews, questionnaires, videos. Something that will tell you, okay this person who is a customer of mine they are one line of data in my database. Or maybe there are a couple more lines of data and they have their tendencies in the way that they shop on my website perhaps. And I know that about them but I don't know why they decided to purchase that one thing. Did they purchase a baby onesie because they're pregnant? Did they purchase a baby onesie because their friend is pregnant and they're going to a baby shower? Did they purchase it to give to charity? I don't know why. All I know is that they purchased it. And so thick data helps to give more meaning to the holes that can be left in big data. Alright, database. Collection of information organized so that is easily accessed, managed and updated. This is a very big definition and that is because there are so many different ways that you can organize your data. And we are going to talk about that. So the two main categories we have are relational databases which are SQL, structured query language and then we have no SQL databases which we'll get to on the next slide. So for relational databases, these are the ones that look like Excel spreadsheets. So it's kind of an easier way in some thought processes to transfer from I have data in Excel to I have data in a database because visually it looks similar. A couple things to note about SQL databases is they're organized, well we've talked about organized tables with rows and columns. Each row in a table has to have a unique key and that key can either be just like one row or sorry one cell or it can be a combination of a couple cells within a row but it's something that sets that row apart as being individual unique. So you can search for it later. And then tables, so if you have enough data to actually want a database then you're probably going to have more data than will fit in just one table. And so to get, instead of having one massive table with everything you can imagine it's much easier for computers and for humans to be able to sort through and access their data if it's in smaller chunks. And so for relational databases we split data into smaller chunks or tables and then we connect those tables via matching keys. So here's one, it's like okay they have an order. This order included five items. So then the order ID goes down to here, okay we know that this order was conducted by this customer at this time for this much money, for this customer ID which isn't actually in this table, it would be up here. Then we would know okay the name of the customer and the email address that we can send them their receipt. But yeah and then at the bottom there's some examples of kind of popular SQL database styles or providers. All right, non-relational databases. These are very interesting to me. I just started playing with these actually. It's because I come from much more of a background of okay we have rows and columns. I was doing SQL administration, SQL database administration for a while. So this is really interesting to me. It's kind of like the wild west, no man's land when it comes to data. Or at least to me who's very used to we have our like nice neat rows and columns and we don't go out of that. But there's some really nice uses to these. So instead of storing data in your rows and columns each item is stored individually with a unique key. So it still has a unique key but it's just not necessarily referring to a row. So they're much more flexible approach to storing data because they can either be unstructured or semi-structured which makes it really nice for especially like web application databases where you are potentially changing as you go over time what types of data you're collecting about say each of your customers or something. And so it's very flexible that way to be able to update it on the fly and you don't have to go back and back populate dummy values in other places. But yeah so we have kind of four different categories for no SQL databases. They're like I said it's kind of the wild west and so categories is kind of a loose term for that. But yeah it's killing me. So the first one is key value and so this is you've got a unique key and I have a value and I find the value by looking at the unique key. The next one wide column store is the most similar to relational databases and like the way it looks visually because you've got your key value column and then you've got rows but instead of having a value for every single field in your table you would have like some of your columns would be missing for some rows. So that's not really something it's a little bit difficult to visualize but it's similar but each row can have a variable number of columns. The next one document is it's a unique key and then a semi-structured format so kind of like a JSON file if you guys have played with those at all where it's got like yeah you guys can look it up later if you don't know what it is. We have more things to talk about today and then a graph which is well it's kind of I mean it's right there it's like okay I have a starting point and then this piece connects to this over here and this connects over here so it's kind of like a web of data or a network of data. All right if you think those are fun go ahead and look them up more later I have included on the slides I didn't mention earlier included on the notes for the slides like there are links to additional sources of information if you guys want to like look into it more and then I also have more things at the end but yeah so data cool we know what data is we know how it's stored so now let's talk about what do we do with this data so analytics analytics at its basic most basic is information resulting from systematic analysis of data so most basic so that doesn't even have to be statistics it's just like oh looking at this board there's some green and some blue on it I just analyzed it so yeah most basic level statistics specifically is a branch of mathematics and science that's dealing with the collecting and studying of numbers that give information about particular situations or events and it's been around for a very long time and it's a tool that we love and use anyways so let's go ahead and get on to kind of some more deeper stuff about this where's my next one okay so types of analytics or statistics just to kind of categorize them specifically we've got descriptive predictive and prescriptive so descriptive is what has happened in the past what do we know about what has already happened or what has come to pass predictive is I want to know into the future I want to look into my crystal ball and figure out okay in five years the property values in this neighborhood are going to be x amount of euros prescriptive like prescription you get from your doctor this is I'm specifically analyzing this because I want to know how to make a decision so what should we do alright so now let's get into data science the way I think about data science is statistics and computer programming how to baby and it's big and it just does big things so specifically it's a little bit more than that but as if you've read anything about data science on the internet it tends to be like oh the data science unicorn is someone who's like awesome at computer science and IT and is awesome at math and statistics and knows things about business and yes but I mean it's you can be a little bit more you don't have to be like awesome at every single thing on this chart to have fun with data science and to do some cool things with it so but in general it is a cross section of those three areas alright so another thing that comes up a lot when you look on the internet about data is data mining and now we're getting into kind of some murky water of there are a lot of buzzwords and tools for pulling information out of data that are related to each other that are like descended from each other that share the same same types of tools and so some of these definitions are going to start overlapping and that's okay just so you are aware so data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis so this can include statistics this can include machine learning this can include just querying to find totals of things like how many customers do I even have in this like mound of things or specifically how many repeat customers do I have and so so data mining imagine a mountain of data and you're trying to get into it to pull out information that's useful to you alright so before we get into machine learning I want to give some background machine learning comes out of artificial intelligence so we're going to talk about that for a second so artificial intelligence at its most biggest umbrella term means any task performed by a program or a machine that if a human carried out the same activity we would say the human had to apply intelligence to accomplish the task so interpret that as you will but some of the things that kind of could fall under that would be planning, learning reasoning solving problems representing knowledge perceiving, moving manipulating objects so that's AI so there's two types of AI the first one is general AI and this is the one that pretty much like any person on the street is going to be they're going to know this the idea of general AI is a flexible form of intelligence capable of learning how to carry out vastly different tasks or to reason about a wide variety of topics based on its accumulated experience so what all that means is God made man in his image man made robot in his image so this is the idea of all of these examples of like intelligent sentient robots that we see in our sci-fi that's kind of more where that's going can you guys name any more like AI and like science fiction or what's your favorite one volunteer I know there's got to be some geeks out there oh yeah that one's from which one is that one from yes oh no I didn't get Wally but I got his friend Eva oh yeah that's a good one anyone else huh oh yeah oh man how did I miss that one that's so popular right now alright another thing I wanted to point out specifically is this with Jarvis Jarvis is still an AI even though at least at this point in the series he didn't have a physical form so AI isn't just connected to physical robots it can also it's just any system that can like think through and reason through science alright that was a fun slide and now you guys are all going to have to yeah voice assistants like you actually had the perfect lead into my next slide so thank you yeah what it can't do and that's where the difference lies is that even on these really smart systems like Siri or I mean I could have put up like there's Google Home there's Alexa those are the big ones right now I'm sure there's others someone was telling me the other week about like Annie or something like that that's like an AI huh maybe or no it was like a personal assistant that sorts your emails it's like a personal secretary AI or maybe it's Amy I don't remember but so yeah those are that's us trying to get there but we're not quite there yet because they do have some very severe limitations they might be very cleverly programmed in like I can't remember what the current Siri one was but like when she first came out there was like ask Siri this and it's not something she can like really actually answer but they pre-programmed in some really like silly answers and so everyone's like that's amazing even though it's like they're asking her to do something she can't do but yeah so these narrow AIs are the ones we see around all the time everywhere so we've got Siri or Alexa or whatever I can't remember what the Google one is named that answer your questions they take down notes for you they can schedule appointments in your calendar they can do all kinds of things then we've also got like things like physical robots like Roombas and like my favorite things ever I hope to get one someday because just like the image like the videos of all of them like trying to learn a new floor plan like you move your chair like a foot to the left or like sorry I'm American so I think in feet and not like meters and then they're just like completely lost so those they're learning they're using AI to learn your home layout discover weekly from Spotify it's like okay you've listened to these things I think you want to listen to these things Facebook's facial recognition so you don't even have to look up your friends anymore you're just like yep that's them that's them that's them oh wait that one's a little weird scroll down okay no that's them and then the last one this one I can't I love Google maps it's like my favorite thing ever I can read an actual map to like put that out there I've done it I learned that skill and I've moved on to the next era of technological technological enlightenment and I love Google maps but one thing that it does is it's not super AI based quite yet it's getting there they're putting in more features but one thing that I really like about it is especially when I'm driving in a city with a lot of traffic it will tell me like okay this area is congested how about you go this way instead so things like that do you guys have any like what's your favorite narrow AI that you use like every day Google maps okay yes anything else Netflix oh yeah does the algorithm work good to like suggest new things for you I'm still on my parents Netflix but I don't want to get off because I've like trained the algorithm for like six years now and I just don't want to start over so yeah anything else YouTube YouTube what's up or up next anything not related to video get another category yeah okay that's helpful okay so it's like assisting you teaching it or guiding you teaching it okay okay okay yeah be interesting to see like how long the learning curve for those is but yeah right yes and I notice whenever Apple like updates theirs because all of a sudden I misspelling word that I didn't misspell before or they like think this word that I typed is suddenly another thing so yeah I yes that's an awesome one all right I'm sure you guys have like a million more because we're just like completely surrounded by I all the time now at least narrow I but let's move on to some other stuff okay so all of that exciting conversation comes down to this machine learning blends ideas from statistics computer science and a bunch of other disciplines to design algorithms that process data make predictions and help make decisions so it's a subset of AI and so you'll hear about it from the perspective of like oh this is AI and it is it's just like one category of AI it's like this Apple is a fruit yes it is a fruit there are other things that are also fruits but yes this is an example of a fruit so yes so there are one specific we kind of talked about it a little bit going back to the general AI the like Wally and Terminator that general AI doesn't really exist today and AI experts are really fiercely divided over how soon they think it's going to become a reality because that's looking at like replicating an entire person's way of thinking and feeling and reasoning whereas what we have today with narrow AI is just like I can do this one task very very well I've been trained to do this one task and maybe it's expanded to okay I can do these tasks very very well but anything outside of that I don't know what to do or I think I know what to do and the results are hilarious which we will see in a couple slides or more further down the deck okay so there are three categories of machine learning supervised which is where you give your algorithm a training dataset and you run it on it and it says okay cool I'm going to predict this is this and this is not this and this is this and you're like okay that's wrong that's wrong that's wrong this was right so we'll keep you and we'll reinforce we'll keep you going more in this direction by like yes those answers were correct so keep going down that thought process unsupervised machine learning is more of like hello robot or hello computer I have all this data and I don't even know where to begin with it can you please look at it and see if there are any patterns or associations or trends that I can't see with my human naked eye because there's just so much data here to look at so it's like if you were do you guys know the where's Waldo books or is that just an American thing okay okay good so it's like that where you're just like there's so much information here it's going to take me forever or I'm never going to possibly be able to find this so I'm going to give it to a robot and they're going to be able to do it or like give it to a machine learning algorithm process it through a computer and it's going to be able to do it right away well right away so and then the third type is reinforcement so this is kind of like unsupervised where it starts with maybe you pre-train it a little bit maybe you just let it go and start just discovering things on its own but the thing with this is it maximizes a reward so the things that maybe you've seen on like YouTube I'm going to train this AI to play this classic video game and it's just like fail fail fail fail at the beginning and then like towards the end of the video it's just doing these like crazy things no one would have ever thought to do so that's this reinforcement learning where the reinforcement is don't die don't let your character die or I'm looking for this high score type of thing yeah okay so I just threw like a ton of words at you guys what out of all of that has been the most useful buzzword that you learned today or one that you're like okay cool I learned something new this is going to help me have a starting point to look into more stuff for my business or this was just cool I didn't know about this anybody want to volunteer anything all of us knew all the words already and I just wasted 15 minutes cool well I hope that there were some in there that are giving you kind of more thoughts to chew on alright so now actually tell us a little bit more about unsupervised learning so if there's no supervision then what what are we telling how are we asking the algorithm to learn so it's things like if you've got a very large just like a really dense scatterplot of data and you're telling your your algorithm hey look at this from a bunch of different angles and see if you discover anything interesting and so this sort of machine learning isn't necessarily to have the machine make a decision for us it's to have a machine show us things that we didn't see before or that we hadn't thought about before and so it's to assist human intelligence to replace human intelligence so maybe my machine learning algorithm looks at this and I'm just like it's just a blob but maybe it goes through and it's like oh well actually you have this interesting like spiral pattern happening to your data which I would have never been able to see so yeah so this one is more for like kind of the idea of how like a microscope helps biologists see more clearly into what's going on in like a bacterial colony so alright it's been about 45 minutes go ahead and take a 3 minute break, stretch, stand up it's like a 2 hour session which is just like I can't sit still for that long so but I get to walk around up here and you guys don't so I'll see you in about 3 minutes so we're going to kick off this next portion of talking about data analytics statistics, studying data from the point of view of the scientific method so as like a side note your goal when you're looking at your data may not necessarily be to do an entire study you may not actually care about predicting something or having like concrete conclusion about something maybe you just want to explore your data and get an idea of what's in there so in this case in that case some of these steps that we're going to talk about may not be as important to you which is okay but you do need to understand that if you're just kind of like looking at your data and getting a feeling for it that's not the same the like conclusions you draw with just your brain are not necessarily the have the same weight as something that's gone through a rigorous statistical process because statistics, the outcomes for statistics can in some ways be contradictory to what we think is going to be the end result and that's because we cannot get outside of bias that's built into our brains from our cultural context from our life experiences and so that's why we have statistics that's why it's such a huge tool that's used across every industry in the world so definitely okay it's just like not be super rigorous when you're just trying to like figure out what's even in there or doing pre-research as some people would say but just understand that there's a difference between those two alright so the scientific method if it's been a while this is our refresher for it first thing we're going to define a question then we're going to gather information and observe maybe or maybe we already have the information gathered make a hypothesis which is an educated guess about the answer experiment and test that hypothesis and then analyze the test results sweet we're done then we're going to communicate what we found to other people because sometimes maybe like if you're just playing around on your own you don't really need to communicate but most of you are looking at doing analytics in the context of business or your job and then unfortunately we have to communicate with other humans so alright actually so the first thing define the question so ask yourself right now what is a business need so we already thought about this a little bit what is that business need that you were thinking about at the beginning of the session or like what is a question that you have that you want to answer with data so kind of other ways to think about that if you're still not clear what are we trying to do what are we trying to measure what are we trying to predict also who is the intended audience for this is it a user, customer decision maker for this thing that we're analyzing and the insight and prediction we want to give so as we're going through this section that's actually like the actual analysis or statistical process keep that business question or business motivation in the back of your head okay so now we're going to go through a couple slides that talk about the different styles of questions and so I've got styles of questions on this side and then on the far side I have some hints towards this is the type of analysis that you're going to want to Google later if this is the question you have so the first thing is this A or B is this an image of a cat or dog is this customer going to buy things or not is this A, B, or C what mood is this tweet are they happy, are they sad are they angry, are they bored those sorts of things alright, how much how many so how much will the rent cost in this neighborhood next year how many twitter followers am I going to get next week another thing then this one is this weird and this one is something that we care about a lot like in the financial sector especially like credit card processing is this transaction fraudulent did this person really mean to spend 80 bucks out of starbucks in new york city nope so is this out of the ordinary then we get into some kind of combinations some of those so how likely is how likely is A versus B or what fraction of the results are going to be A and what fractions of the results are going to be B not just is this A or B but how is the whole pot divided also how do A, B, C, D blah blah blah how do these all compare to each other how are they related so for example how many which fleet or van in my fleet of vans needs to be have maintenance done on it first which one is the one that I have to spend the maintenance money on this month because otherwise the wheels are going to fall off and I won't be able to use it anymore type of thing okay and then clustering how is this data grouped which was kind of like what I was showing over here so for those Netflix fans out there which viewers like the same type of things and then also one of the problems that we get and one of the depending on what type of statistical method or analysis method you're using it can be hard if you have too many factors or too many parameters A, B, C, D, A, A easy you know like continuing on forever and ever and so one of the tools that we can use is called dimensionality reduction and so that says okay I've got all this stuff how many of these are really kind of actually the same thing and I can just eliminate the duplicate information altogether or like do something to combine them so that's dimensionality reduction so one of those is what are the most common patterns in gasoline price changes across Finland or what groups of words tend to occur together in this set of documents so alright so anybody want to volunteer their business question or their business need that they're looking at knowing more about analytics to solve go for it okay did any of those like questions or tools match up yet maybe yeah that's yes that's a good point would you say something to help what you're working on anybody else have any insights to share yeah have you played with machine learning before okay good like that's going to be a really fun problem to tackle cool so it all goes down to the data we can want to know everything in the world but if we have no data to actually like get us those answers we're stuck so take stock of your data sources what do you actually have so what data do you actually have right now if you're a business what data have you collected maybe it's customer data maybe it's like if you're somewhere like uber maybe it's your drivers have been driving so what data do you have that you own then what data could I collect maybe I can do a survey to my customers and see what they think maybe I can do some sort of experiment like an A B testing where I give half my web users this website and we give half the other the other web users this website and we see we analyze how the behavior is different um is there any open data that would be useful the answer is probably yes but there's a lot out there so it you know have fun digging through it there's cool things out there but there's a lot anyways um and then also kind of another thing especially if you're a business and not necessarily an individual is there data that I could purchase is there a data provider that has the data I want or I need and instead of me going through the process of collecting it myself can I just buy it from someone alright so now you have data now we get to talk about data quality because just because you have all this data and it's awesome doesn't mean it's actually useful at all um so it's like when you have guests coming over and you're like yes okay I have spare bedding I've got spare sheets I've got spare pillows but they're up way high in the cupboard and you're like cool and you get up there and you're like oh crap these have actually been chewed to pieces by mice so I thought that I had stuff for these people but I don't so it's kind of that that sort of framework we need to work in like what actually is there it's not just I know I have data but what is it actually is it accurate and correct is it complete is it reliable are the format standardized um consistent relevant up to date are the records unique um so some kind of examples of this would be you have customer records that are missing a zip code or sorry a postal code or maybe they're they have an address but it's like not a real address um maybe you have multiple representations of data so maybe you have customers from all around Europe and for some of them put in their country as fi and some of them put it in as Finland so we've got some inconsistencies here um where we would know okay those mean the same thing but a machine is not necessarily going to know that um also do you have data that exists outside the reasonable ranges for it do you have a customer that purchased something in 2075 so things like that or 1908 and you're a web application so um also things like currency do you have like is that really what you think someone meant to pay or do you think that's really what someone paid or did they really mean to donate a million dollars to your campaign or whatever so um being able to go through and make sure things actually make sense um so then the next thing is data cleaning and this is the very unfortunate reality of doing analytics and data science is you're like yeah I'm gonna like learn cool things I've got like this business question I'm excited to like answer these things people that need to know this and I've got some cool data sources from all over the place and it's gonna be awesome what I'm gonna pull out of it and then I'm just gonna yep I got data sources that's cool so now I'm gonna start talking about okay I'm gonna do all these cool like analytics techniques and um then you're just like oh but my data is not gonna go into my cool analysis tools this sucks so it always seems to catch people off guard like how much time like how much time you may end up having to do with making sure your data is in the right formats and um is complete and stuff like that so some things for to kind of keep in mind with data cleaning is identify and remove your duplicates so if you've got duplicate entries for something that's like the same data in two lines uh standardized your decimal places so like for a value are you gonna have two zeros after your decimal you're gonna have three um so make sure that's the same because that could throw off calculations um especially if you're over in like the the room maybe not in business so much but like the oh no business because like when you're when you're working with financials you want to know how exactly how many places you're rounding especially if it's something um money oriented um and then things like dates and times can be in a lot of different formats um so make sure those are all matching across all your stuff um plan to plan how you're gonna work around casing inconsistencies so are you just gonna tell your or are you gonna work with a tool that doesn't care if you've got capital letters and lowercase letters or are you gonna go through and run an algorithm to put them all caps something or like translate your data to all caps stuff like that um normalize spellings uh make sure so if some people are spelling things the British way and some the American way like just thinking in English um and then have a plan for dealing with missing data and this one is it's just one little bullet point but it's like it should have just been like missing data what do I do um because you have to be very careful with this and there are a lot of technique there are techniques out there to be able to work through this and um in the notes on this slide I have a link to an article that's like okay here what are some things that I can do if I have missing data um so we're not going to dive too deep into that today um but just be aware that if you have missing data tread carefully because filling it in like know what really have a good idea of what you're doing if you're filling in for missing values um because sometimes the missing value in and of itself may be may have meaning um but that's I mean it it depends a lot on the context of your data and the question you're seeking to answer alright cool so now we're going to talk about making a hypothesis so in the realm of like science and academic research and maybe even if you're doing like a very thorough study for um a client or for your own business you need to have a clear picture of what it is you're researching so it needs to be an educated testable prediction about what will happen or what has happened needs to be written in clear and simple language and the variables have to be defined in easy to measure terms the point with this hypothesis that's guiding your specific research is that it needs to be succinct it needs but it yet it needs to convey exactly what you're looking for in order to eliminate confusion alright so then we get into this fun little thing um a research hypothesis that we just talked about is not the same thing necessarily well it is not the same thing as a statistical hypothesis in some cases they may kind of overlap and one serve the purpose to answering the other but specifically a research hypothesis is what the experimenter or researcher believes will happen in the research study getting into statistical hypotheses the choice of the statistical test that you're using will depend or the choice so your hypothesis will depend on what type of statistical test you're using and that will be determined by how you set up your research so maybe you're caring about correlation maybe you want to compare means or you're wanting to do a regression line to predict the future maybe it's something as fun as nonparametric there's like a lot of things that a lot of ways we could poke at our data to be able to answer this research this scientific hypothesis um so in order to answer this research hypothesis this side of the board um we may end up doing a lot of different statistical tests and so each of those tests will have something different or have different hypotheses associated with them, statistical so most statistical tests use these two statistical hypotheses that we're going to talk about but they indicate something different depending on the test so in general the null hypothesis is signifies the sample observations resulting purely from chance so because we are spending time and maybe money on this research project we are wanting to reject that what we're researching is only due to chance um so the alternative hypothesis no maybe you might have something that's like h1, h2 depending on what type of method you're using um the alternative hypothesis signifies what you observed so and statistics has a very weird way I don't know they're just like I think statisticians are just very cynical because they're just really really trying to reject this they're just like nah I'm not going to accept anything it's like I will fail to reject this so you never accept the alternative hypothesis you just fail to reject the chance one so one of my favorite things you would try to ask a question to take an example could you walk us through an example yes so we have alright let's say I have a coin and I'm going to flip it and I'm going to flip it a lot of times like a hundred times let's say um and if it is a fair coin meaning that it isn't weighted on one side or the other it's going to land heads up half the time and tails up half the time so my I guess these are actually so in this case I want to say that so I'm trying to prove that this coin is not fair so if it was completely due to chance I would say the proportion or fraction of the time that it's heads versus tails or the fraction of the time that it's heads is equal to one half or one half um but I'm playing whatever game with my uncle Steve and he cheats a lot and so I'm really sure that this is not a fair coin so what I'm actually trying to prove that his head streak is not actually due to chance it's because he's a cheater so this is the one that I'm trying yeah so we're through this it would be like okay I'm doing this and if I get half like 50 heads 50 tails or something pretty close to that then I'm going to say okay well I failed to reject the null hypothesis okay uncle Steve maybe what you've been getting that streak of heads you got a little while ago maybe that was actually due to chance um or I actually reject this I'm like I told you uncle Steve you're a cheater and now I want my half of the candy back so real simple yeah yeah well that's probably using a statistical but like it's probably that research hypothesis is very strongly tied with its statistical test but yeah so there are some cases where the like scientific research hypothesis doesn't necessarily mean the same thing which is why I made the difference earlier but there are some times when it does so no this is more when I take the sum so I flip it I have a hundred I'm going to flip this coin a hundred times and it's going to give me heads 50 times and tails 50 times in a perfect world but maybe when I actually like did it this was like 49 and this was 51 so we're saying this is approximately one half or approximately 50 percent and so it's for specifically this sort of a test which is a proportion test which is looking at how it's that kind of a versus B like how many of my things are a how many of my things are B we're specifically looking at the totals of things but there are other statistical tests that are more looking at like individual things there's so many out there and I will let you have the pleasure of looking them up later so we can carry on for some other things alright so where are we at as I talked about a little bit earlier kind of the next step would be an experiment so we on the like data gathering slide we talked about maybe we're going to need to do an experiment so this would be where you would do it and so this is just a visualization of an AB test which I mentioned earlier which is like you have a website and you want to test how this color scheme affects like do your customers like this color scheme versus this color scheme something like that or do they respond better to the comment button being put in this part of the screen versus this part of the screen so it's an isolated we've decided to just test one change in two different ways and see which one which one people like better but there's tons of other experiments you could do this was just one quick thing I could put on the slide alright so after you do your experiment you test your hypothesis so we were talking about so a little bit more about statistical hypotheses and like so the null and your alternative these guys have to be mutually exclusive meaning that if something happens falls into this category it cannot also fall into this category so we need to know that when we are looking at this result we can be like ok we know that one didn't cause the other we know that they're independent of each other so you designed you got your null and your alternative hypothesis all figured out formulate your analysis plan so which test statistics are you going to look at which ones do you care about are you caring about averages like what's the average price of a house in this neighborhood versus the average price of the house in this neighborhood do we care about proportions which is how much of the time is my coin doing heads versus tails do and then there's other things like t statistics and z scores which are more the traditional like statistical curve like how does where does this fit in your population and I will let you guys play more with that on your own later so basically we're going to now analyze the data with whatever that test statistic plan we were working with and find the value of that test statistic and then we're going to interpret the results and either reject this guy or not accept fail to reject this guy um so that's statistics kind of in a nutshell it was very high level ok and that's on purpose I didn't want to get down too much into the weeds because I cannot you guys statistics in a two hour section and also talk about all these other things so so yes here's um analyzing your test results with the test statistics so alright don't worry too much about that slide yeah you're getting the deck ok so then the next thing that's about like analyzing your test results it's not just ok did this fit in the correct acceptance zone of this normal data curve you need to actually use your human brain a little bit too so for instance I know you guys can't see this so it's like that was surprisingly easy how come the robotic uprising used spears and rocks instead of missiles and lasers if you look at the historical data the vast majority of battle winners use pre-modern weaponry it says thanks to machine learning algorithms the robot apocalypse was short-lived so and then the other one that's been like this has been around the internet the last couple years and it just cracks me up and I'm pretty much guarantee you it came out of machine learning training data sets um because it's the is this a dog or a food item um so we looking at this we're like duh that's a muffin duh that's a chihuahua like hello but the computer doesn't know that so um we also need to look at the results that we're getting from the machine and be like check it against reality be like does this actually make sense or is this really crazy and it's gonna if I base my decision on this am I gonna do something really crazy so like eat a chihuahua alright so I just poured like a semester's worth of statistics at you in like half an hour um so do you have any questions on some of them I may just be like that's awesome write it down and look it up later but do you have any questions about how all of that could relate to your business I'm just gonna yes do you have any that you want to vocalize or share with the class alright I'm gonna just assume that there's just a lot of information and people need to digest it a bit more so that is fair because that was a semester's worth of stats alright um and on that note there are resources in my list at the end that will if you haven't if that was all like new or it's been a long time since you looked at any of that there's some free classes for reviewing some of that if that's what you want to do um yeah okay so now we're gonna get into the exciting part analysis tools um so I have so the tools I have here that I'm presenting are not an exhaustive list they are a list of the things that I have found useful that I like using um and then I patted it with like maybe a couple more that I found around um but yeah if you find something cool and it's not on this list I mean maybe email it to me because I think those things are fun um but yeah okay so for analyzing your data so that pulling the meaning out running maybe some statistical analysis um if you are not a coder if it's out of like coding into a terminal scares you to pieces or you just like I do not have enough time to do that in my life um then this would be a good free thing for you to look at Google Sheets so if you have Gmail you're already signed up for it um so yeah it's a free Microsoft Excel alternative too so that's been one that's Excel is something that is really can be really used in the in business for doing like basic um and analytics or like trying to do that first wave of pulling information out of data if you're yep so it supports a decent number of statistical functions and mathematical functions and in the like notes on this page I have um actually a link to the list of all functions it has and a description and like what the inputs are and what it's actually going to do for you um yep minimal coding and one thing that's really cool about Google the Google suite is that it supports multiple people working on the same file so if you're passing something back and forth with a co-worker like this is a really nice way to do it um if you really care about who made what change and when this one may not be so much for that you might want to go through a more strict version control system but if you're just like actively collaborating together especially if you're working remotely with someone um this is a really nice tool um and it's stored on the cloud so you can access it from anywhere it's mobile friendly I even checked like on my phone you can do statistical functions in your spreadsheets from your phone so that's cool um Google Sheets okay um next one this is this was my first introduction to statistical software and that's R it's a free statistical software package you download it you can do it just from the command line if you really want to but they also have some nice um GUI's graphical user interfaces my favorite one is RStudio so I don't have a screenshot of it um but it's got like the terminal that you run things in but then it's got like code that you're like can be writing a program and you can run the whole thing or just parts of it it's got like your visualization outputs your help files for the different like packages or libraries you're working with it's pretty nice um so one thing that's really awesome about R and why I'm very very passionate about using it is because it is very much supported and actively used by the scientific community and so there it's an open source software and if no matter what application you have that you're trying business or science or some sort of question that you're trying to solve there is a tool set out there already because somebody built it as part of their master's thesis probably um and so there's a very wide range of packages or libraries or toolboxes or plugins however you want to think of that that you can download and to make our work for just your problem that you're trying to solve um yep and it's a very active open source community especially with in the science world so if you really want to be up to date with like what the scientists in academia academia are using then that would be it um another thing that's kind of been up and coming over the last couple years for the field of data science it used to be just R R R R we love R um but um especially as we're moving into it's not just enough to build a machine learning model or build a statistical analytics model have it run once make a beautiful picture put it in a report tell people about it we don't care like that's still very important but another thing that's become even more and more important is being able to run those statistical on machine learning models live in your production environment of like your web application um so yeah so one thing with python has started to become much more popular with it's gained a lot of popularity and it may even have like past R now in the data science community um so this is a general purpose coding language that has statistical modules packages libraries it has things that you can import that will do statistics for you um so this is a coding language there are some nice things out there for um so you're not just coding in a terminal um my favorite is jupiter notebook it used to be called ipython but it's jupiter notebook now um and so this one it's nice for you to be able to mix like your notes about something some code see the output see a visualization all kind of in a format so it's not you wouldn't use the notebook for sending your algorithm to your production environment maybe but um it's nice to have an initial research phase but yes good at production analyzing analytics models um oh a couple other things and why from the computer science side that we're excited about this is it integrates really well with most cloud platforms or platform as a service providers um so that's where the future of websites and um data hosting is now um it's also fast and scalable and so you can do stuff like on your little laptop all the way up to google uses python actually so to run their stuff actually they use a python plus something they built in house called tensorflow um which is also open source language for machine learning um another cool thing about python is it supports parallel computing so you can if you got like a ton of data you're not going to run your statistical or your machine learning process or algorithm on the whole thing at once you're going to divide it up into smaller batches like if you have a thousand cookies you need to bake for a wedding you're not going to cook them out once you're going to cook them in batches and so the thing with parallel computing is it's the difference between me cooking a thousand cookies for a wedding in my little oven versus a professional bakery that has 10 to cook cookies all at once so you can do things at the same time um there we go python that's my plug for that um so those were all for just doing the analysis so doing statistical analysis scientific analysis um if you're just worried about what do I even have in my data that's going to be data visualizations which is going to be in a slide or two so our last step in our scientific method is communicate your conclusions so hopefully at the beginning you kind of thought about this as you were like designing the like analytics process you were going to go through but especially now or maybe you did think about it at the beginning but you've got multiple audiences that want to consume this maybe it's your you're doing a report but like you're not very technical boss wants to see it but then also your very technical product like development team wants to see it too so have to think like okay how am I going to predict I've done this analysis I've gathered this information I figured out these insights how am I going to portray it in a way that actually is meaningful to the person receiving the information so identify your target audience or audiences um remember what you're trying to achieve this is a big one because we get into the weeds of doing analytics and we're like oh this is cool oh this is cool and you're just you can just kind of lose your focus and so when you get to the end yeah it's finally maybe you did all that extra research but what did you start out with wanting to know um I love this avoid creating a wall of text as I have created a wall of text on this slide sorry guys um but and then to add to that use visuals to convey information and add interest the most boring thing in the world for anyone is to just get or like it's so hard for people to just read a textbook over like just walls and walls of text and teeny tiny type and like your eyes lose spot on page you end up re-reading the same sentence three times and it's just so dense that it's hard when you get done with reading a chapter in a textbook you're just like what was that even about like what did I even learn so use visuals to break up your walls of text um and then also I mean don't go so far the other way where it's just visuals because you have just visuals and no context they're also meaningless so they look pretty people are impressed but they have no idea what you're trying to say um and then kind of the last thing link the information that you found back to your business need or your research questions so doesn't matter if you found cool things if they don't actually tell you anything about the question you were trying to answer to begin with um so visualization tools so if you are a coder or you want to start playing with coding um are and python have um actually a lot of different data visualization libraries these are just kind of the two like go to ones um and the cool thing about data visualizations that come from a coding standpoint is you can make it be anything you want you can change what color what symbol like you can make it animated you can leave it static so it can be pretty in a pdf report um if you just had a scatterplot but you wanted to put a line through it you can put that line through it so it's very very very flexible but with the flexibility comes a steeper learning curve potentially so on that note I also have some that are visualization tools that do not require coding yay um so these are all so nice if even if you are a coder and you just want to like throw something together real quick and you don't want to spend the time to build a graph that looks super cool they already have pre made one so within google sheets they've got built in ones that you can use and they actually have a pretty decent size library I was at the last time I had looked at it it was probably about like five six years ago like um into my undergrad and I was like oh there's not really anything here but I checked like this week and I was like oh my gosh there's actually some so I'm excited to tell you guys that is a great resource for you um and then two things that I found crawling around the web this week um are web data rocks and chart builder so web data rocks is a bit more um a bit more flexible you get you can have a lot more types of visualizations that come out of it you import a csv excel type file and then it'll let you use a user interface a mouse to pick things um no coding required chart builder this one was developed by oh I can't remember it's it was a newspaper in the states um that they needed something that their reporters could very quickly dump their data into and get a visualization out that they could use in their articles and so this one is like super easy you just paste your data in this box and then pick a couple things for how you want it to look and then it gives you a data visualization that works for websites and it also has a little picture of a phone so you can see how it would look on a phone screen that is very simple but if that's all you need that's awesome um so yeah there's some visualization tools for um those that don't really want to deal with coding alright so just to recap this is what we talked about we went through the scientific method we defined our question that was relevant to our business we gathered our information and observations um maybe we did an experiment maybe we just looked through the data that we already had available to us um then we made a hypothesis oh wait sorry I skipped it so we gathered our information and before we could really go into doing analytics we cleaned our data because that's what we need to do um then we make an educated guess about what we think the outcome is going to be which can be a research hypothesis which could also overlap with what your statistical hypothesis is going to be if you're just doing one test um but if we have multiple statistical tests that we're going to do we're also going to have statistical hypotheses for each of those tests experiment so if we needed to do an experiment this would be where we would do it test our hypothesis against the results and against so that's the actual doing the analytics running the analytics algorithm part um analyze your test results do they make sense are they significant are they not and then communicate your conclusion so before I get into actually talking through these last couple minutes about um types of resources that I found useful um do you guys have any questions or like that was cool I'm going to use that for my business moments or anything like that today that we haven't already talked about are we thoroughly confused on statistical hypotheses okay I'm going to take that as a yes no alright so self-study resources so the first one is just like helpful articles that have helped me find guidance like when I'm trying to put together a data science project or I'm trying to answer a question for my business or for myself or for school um this has been these have been really helpful for me to help understand what tools do I have out there how do I phrase my question um what techniques what types of things can analytics machine learning statistics what types of things can they answer what things maybe can they not um also just a plug check out the notes that will like that are on each slides I have plenty plenty plenty more links on them um but yeah so this the purpose of this slide is just to give you more context if you want more context um so then we're going to go on to YouTube and so this is not an exhaustive list by far there's so much stuff on YouTube um one of the biggest uses of YouTube all of those like different softwares that I talked about like Google charts or um Matplotlib or whatever you can go on to YouTube and be like what is this show me a demo how do I do this and there'll be somebody walking you through it um but so for you guys that are more audio visual visual learners this is going to be where you want to go to learn more um so these specifically these three guys um are YouTube channels that I've really liked how they explain things they have some stuff about statistics they have some stuff about machine learning some stuff about math some stuff about random other stuff too um but they explain things really well with really interesting um graphics that make things make sense quickly and are also entertaining so I like them a lot um and then mooks so those of you that like to learn by doing or want a more in-depth understanding of this and I will we will be sending we will be sending this out so don't feel like you have to write down all of them um my favorite has been Coursera um and they both Coursera and several of the other of the mooks that are on here they are for me studying on my own can be very hard to stay on top of myself to be like okay I need to go home after I worked a full-time job and open my laptop and study for this amount of time that's really hard if I don't actually have motivation like um or accountability and so what Coursera does specifically is they're set up like a class and so you have assignments do at a certain time you have a test do at a certain time the class has a start period and like a start time not like each session but like the beginning of the course and an end of a course so you have very much like I have two months to finish this um and if you I mean it is flexible so if you weren't able to finish it all they will let you like port your data over to the next session so it's not like oh I've lost it I have to start all over again but for me that accountability is really nice there's also on um some of the others which might be in the longer list of mooks which I have on the next slide or not on the next slide oh here's another one Udemy which is kind of the similar thing both Coursera and Udemy now have paid versions but um that's like if you want a certificate to be able to put on your LinkedIn page or include with a portfolio or give to your employer type of thing um so but they at least for Coursera so Udemy has classes that are free and classes that are paid um but Coursera has pretty much everything that's paid you can take free almost they now have some like master's degrees on there which I'm not sure if you can but um like there's specialization tracks you can take it all for free and you just don't get the like certificate at the end so Udemy there's some things for Udemy I haven't worked with Udemy as much are you Udemy Udemy I don't I actually don't even know which one it is um but I've I've done a little but not as much as with Coursera okay so here's others there are so many massive open online courses and platforms and here are just like a couple more um that are like in in English um so we've got edX futurelearn which is out of the UK Udacity so both edX and Udacity are American canvas is American Stanford like Languida is American Swayam is actually out of India but it's English um and then there's I've included down here in the notes but you can't see a link that is just like a ton of MOOC like MOOC providers and MOOC platforms and they have if English isn't your favorite language to learn in or if you speak other languages they also have things like Mandarin, Spanish, Portuguese French, Italian, Thai, Hindi, Arabic, German just like it's a huge list so there are so many of these out here so you can definitely be able to find one that fits your learning style and like your language preferences and stuff like that oh another thing I plugged that Coursera is really awesome if you want deadlines but there are also MOOC providers that don't have those deadlines and they're just like here's basically the like the content for calculus one class and here's the chapters you'll read out of a textbook and maybe here's a prerecorded lecture here's an assignment and then I have like a sheet that you can check your answers against but nobody's checking if you did it there's no grades no pressure so you can do it whenever you want without feeling like oh no I'm not doing it enough so that's really awesome too okay oh yes and the most important tool that I want you guys to take out of this let's see if this works Google that website was let me google that for you if you are curious but yeah so we so the biggest tool that you have now is Google because it is so much information about the world and hopefully through this talk you have been able to like pull like think or like absorb some of these concepts maybe take some notes that you can be like okay I know where to start asking questions now I have a little bit better idea of like an intelligent question to ask Google to be able to learn more about this thing in the future so do I have another slide back to my communications 101 presentation professor tell them what you told them so we talked about buzzwords we talked about types of data types of data storage, types of analytics then we went through the scientific method as it relates to an analytics project and we got some ideas of how can we answer our business question in a rigorous statistical scientific way and then we talked about some self study resources and hopefully that will give you guys some exciting new adventures with data analytics in the future so thank you for coming today and before I go oh man it's 544 because I didn't time my slides and I was worried I was going to end up with either 30 minutes or 4 hours so we're good we're right on the dot okay so before well I'm still here before I go does anybody have any more questions related to all of this information alright feel free to come up and chat with me on your own you are released have a wonderful Monday evening thank you for decision making so if you are more interested in data analysis you can take part in those discussions and SQL will start tomorrow and for those who already registered remember to install the website for SQL so thanks for today