 Good morning everyone, this is Dr. Bill Fisher. I'm on the faculty of the School of Library and Information Science at San Jose State and the coordinator for our colloquium series. This is our first colloquium for the 2013-14 academic year and I'm delighted to welcome you to this presentation by Amy Asselt who will be talking to us in a few minutes about big data. This session is also co-sponsored by the San Jose State SLA student group. We are putting together a group of students who are interested in special librarianship and Amy does work in a special library type environment. And to help us get the year started off right and to help welcome the student group, we have an introduction for this presentation by Deb Hunt who is the current president of the Special Library Association and is the library director at the Mechanics Institute Library in San Francisco. So I'm going to turn things over to Deb for her introduction. So Deb, go ahead and click on the talk button. Okay, so I'm grateful to be here and I want to welcome all of you out. I know this will be a great presentation. I had an opportunity to hear Amy speak. Gosh, last fall I guess it was almost a year ago at Internet Librarian about big data and she has a really good way of making it understandable because we hear about big data all the time and it's a little overwhelming. As I'll put in a quick plug for SLA, last in the spring I was in Philadelphia and SLA has many subject divisions and one is the Pharmaceutical and Health Technology Division and they have their own mini-conference every year and the keynote speaker at that conference was Ed Dumble who's the editor of Big Data. It's a new journal that just came out starting this past March. And I'm just trying still to get my head wrapped around it and Amy does a really good job about helping us to do that. So I'll just give you a little bit of biographical information about Amy and then I'm going to turn it over to her. And I hope that if you have any questions about SLA, Amy's a member. I'm a member. Bill's a member. Bill's a past president too and Amy's been very active. It's a great way to grow your profession and grow you professionally into network and to find work and also to find colleagues who will become your friends and help you in your career. So Amy is the director of database research at Compass Lexicon, a global economic consultancy where she finds, analyzes and transforms information and data into knowledge deliverables for PhD economists who testify as experts in litigation. She's also an author and conference speaker on such topics as adding value to information, evaluating information integrity and quality and marketing information services. She has a BA in history, Phi Beta Kappa from the University of Illinois at Chicago and a master's degree in library and information science from Dominican University. So Amy, I want to thank everybody for being here and thank you Amy for speaking today. I'm going to turn it over to you. Okay. Thank you Deb so much for that really generous introduction. And I'm just going to use that as a quick segue to put in an endorsement for SLA membership. I wouldn't be here today if it weren't for SLA. That's how I met Dr. Bill Fisher and Deb Hunt and they've been tremendous mentors of mine along with so many other people in SLA. So I can't encourage you enough to join. It's really where you meet colleagues that will help you at all stages of your career and learn all the skills that we need to succeed in the workplace. So I'm encouraging everyone within the sound of my voice to join SLA today. And I'd be very happy to speak with anyone about SLA. I'm the chair of the leadership and management division this year and please join. And now on to Big Data. This is Norma Desmond in Sunset Boulevard and she said, I am big. It's the pictures that got small. And I think as librarians and information professionals, we are big. It's just that the data got a little bit bigger. Big Data is not new. It's just a new term. When people ask me, what is Big Data, I always tend to sort of want to fall back on the Supreme Court definition of pornography. You know it when you see it. McKinsey did a big study on Big Data and they determined that the amount of data collected is going to grow by 40% per year and that 15 out of 17 industry sectors in the United States will have more data stored per company than the Library of Congress. So that's every company in every industry having more data than the Library of Progress. But how is Big Data different from data that we've looked at all along? Well, it has to do a lot with the format. It can range from everything from plain text to audio. At my work, we call video the genie in the bottle because if we have to start looking for historical videos, that's really a game changer. It's going to be very difficult to look for historical videos from the past, but they're part of Big Data. Also, tweets and server log files and then the Internet of Things data and what we mean when we talk about Internet of Things are those mobile sensors and sensors that are on everything from vending machines to the highway to personal sensor data. It's something that's being collected all the time. And then there's that data that used to be dropped on the floor. Maybe the data that in the past, no one even noticed with missing, that now we're starting to collect. The Gartner Group characterizes Big Data by the five Vs. They actually usually describe it as the three Vs, but I'm going to throw in two additional Vs, which are really the opportunities for information professionals. Their first V is volume, just the sheer amount of data that's being collected, as I just discussed, and then velocity. Just the speed at which the data is becoming available to be compiled. For example, we talk a lot about now Twitter being the new news wire. So if data is appearing and being collected in real time, the speed is just unbelievable. Also the variety, as I had in the previous slide, it's just every format that's out there. Also then the two that are the real information professional opportunities, which are verification, were the people that can determine is this data actually valuable? Is it from a credible source? Is it something that we want to collect? Just because it's there doesn't mean that we need to look at it. Also value. And when we're talking about the value of the data, there's really three things to consider. First of all, it's challenging to determine the value. How do we know if the data is quality? Is it from a reputable source? Can it be cited? Can we stand behind this as something that really is truthful and valuable? And then there's the risk. For companies that are collecting big data and then creating big data data sets and deliverables with that data, it's the jury's still out on whether or not a company owns that data set that they've made from their data. The core decisions that have recently been handed down have left it sort of a nebulous area as to who owns the intellectual property of big data data sets that are made. So, throwing in that third challenge of expensivity, the cost involved for companies to collect big data, analyze it, do statistical analysis on it, et cetera, is very expensive and then the risk is involved because to go through all of that and invest all of that time and money and to not really own your product is really a problem. I want to talk about the cool things people are doing with big data in the context of three different industries where I feel that big data projects are offering the most value, not only for the dollars that are invested, but also for the quality of life that these big data initiatives give to people that they affect. And the first one is healthcare. 20% of hospital patients in the United States are readmitted within 30 days of discharge. So, this is a huge problem. It's a problem for hospitals, insurance companies, doctors, and especially office patients. So, Microsoft has a big data project called the readmissions manager and what it does is analyze data to predict probability whether people will be readmitted or not to hospitals and what this project found were two things. The first thing was that there was a 14-hour tipping point for people in the emergency room. People who remained in the emergency room for 14 hours or more and I have no idea why people would be there that long. That just seems completely unreasonable, but they are. Those are the people who are likely to be back after they leave the hospital. The other red flag that they found was the word fluid on medical charts. Patients who had the word fluid on their charts were highly likely to be readmitted. So, doctors can use this data to follow up more frequently with patients that have those red flags in their data and try to preclude them from being readmitted to the hospital. Doctors at Stanford and Columbia are using the data to devise an algorithm that looks for pairs of drugs that when they're taken together they produce side effects that are not associated with either drug taken alone. So, this initiative had several steps. The first was mining the food and drug administration's adverse drug reports database and what they found there was a link between people who took a drug that lowered cholesterol and a drug used to treat depression. People taking both of those drugs had high blood sugar, but people just taking one of them did not. So, they thought, hmm, this might be a problem. So, the next step was to analyze the data further by looking at hospital records at Stanford Medical Center. So, they were looking for people who were taking both of those drugs to see that they have high blood sugar. They found some people, but that's still not conclusive. So, then they looked at patient records at hospitals all across the United States to try to further prove the combination and the link between the two drugs and high blood sugar. So, they did find people in which this was happening. The next step of the project, and this is what's really interesting is, they decided to look at verbiage entered into search engines. So, they worked with Microsoft and the search engine they looked at was Bing and they looked at 81 million searches. Now, that's big data. 81 million searches, that's not something any of us could scan or parse through on our own. It really requires a big data application. They looked for people who searched for the terms for the drug names first, for the drug that lowered cholesterol and the depression drug, and then other terms, like hyperglycemia, high blood sugar, and blurry vision. They noticed that people were searching for all these terms, the drug names and then these side effects. So, then they decided to see what was the frequency in which they conducted these searches. They found out 30% were conducted on the same day, 40% in the same week, 50% in the same month. So, obviously, this data is not concrete proof that these drugs cause high blood sugar when taken together, but they're a signal. And Nate Silver wrote a really interesting book about big data called The Signal and the Noise. And what he's talking about is big data signals, data that actually is valuable versus the noise, everything else that's out there, that data that's dropped on the floor. And that's what we're good at as librarians and information professionals, isolating the signals from the noise. And with this drug study, they were able to do that because this seemed like a signal. And the advantage was that this is so much easier than a clinical trial because you don't need to find the people to sign up, go through all the paperwork to sign them up, track them, it's much quicker to better outcomes more rapidly. Petna is a huge health insurance conglomerate and they have a big data strategy for patient engagement to try to encourage healthy behaviors and lifestyle changes. And what this is, is there's some conditions that when people are diagnosed with them, they're overwhelming because they can lead to a whole host of problems and they have lots of symptoms and sometimes when people are given these diagnoses, they're just overwhelmed by the information and they end up not following through with anything. So for example, one diagnosis might be something like this is metabolic syndrome because metabolic syndrome can result in a whole host of problems, heart attacks, strokes, diabetes and it's also caused by all kinds of symptoms, waist, blind size, blood pressure, cholesterol, levels, glucose levels. So the Aetna Big Data Program looks at individual patient screening data and lab results and it creates a highly personalized treatment plan for each person which highlights just one or two things that would have the biggest statistical impact on improving one individual patient's health. So for example, one person based off their individual data might be told, take a cholesterol lowering drug and you'll lose five pounds to make the chance of a heart attack less likely. Another person whose data looks completely different might be told to focus on lowering their triglycerides to avoid adult onset diabetes. So same diagnosis, metabolic syndrome, but completely individualized plans of treatment to try to preclude these diseases that can result. Gojo Industries is the maker of Purell Hand Sanitizer and they have an internet of things project where they use electronic sensors to compile big data on secondary infections in hospital patients. So we all know the adage, right? You go in with one thing, you come out with something else. So they're trying to stop this by using these sensors to compile data on healthcare workers. When they entered and exited patients' rooms and the times that they did so and whether or not they used hand sanitizer and washed their hands, whether they were disease carriers, things like that. So they tried to use that information to track patterns and outcomes based on patients who had secondary bacterial infections. They looked at that data to try to head off these instances into the future. Transportation is another industry where there's a lot of cool big data projects going on. And there's an old adage in the highway and road building industry that states you can't build your way out of congestion. So in the future, we might need to look to big data to try to provide more efficient transportation. Street bump is an iPhone app that's used by the city of Boston. And what it does is use the phone's accelerometer to identify bumps in the road. So when a car hits the bump in the road, it sends the location of that to the city's transportation department and they can, the idea anyway is that they'll easily dispatch road crews quickly to fix the potholes and maybe in an area where there are a lot of potholes, they could target that area for capital and highway improvements and things like that. But the interesting thing was that when a car hits the bump, it's not always a pothole. It could be an anhole cover, it could be a speed strip, who knows. So they had to really tweak that app to make sure that what they were looking at were actually potholes. So it's a highly sophisticated app that looks at all kinds of different filters to make sure that potholes are what are being hit. Now I live in Chicago and the way we fix potholes is you call your alderman. So I think this might be much quicker and easier than that process. ODOT is the Ohio Department of Transportation and they use NRIX, which is a provider of traffic apps and tools that compile traffic data from all kinds of inputs. And the inputs are things like smartphones, GPS devices, and road sensors. And they're used to predict travel times and traffic flow. So what they're looking for are cars that are traveling below the speed limit between the hours of 5 and 9 p.m. So when cars start to go slow, ODOT targets these areas to see. Do they need a roadway improvement at that location? Should they install carpool lanes or a high occupancy vehicle lane, something like that. And they also have a goal of clearing all roads statewide, within three hours of the conclusion of major storms. So they take that same data and mash it up with weather data. And that helps them to send plows and other road clearing equipment quickly to places where the traffic is slow. And also perhaps target those sections of the road for capital improvements. Xerox, I have to admit, I tend to think of as kind of a dinosaur company. I don't make a lot of photocopies anymore, but they're actually on the cutting edge of big data with the program they have called Express Lanes. And they partnered with the Los Angeles County Metropolitan Transportation Authority. And the reason they chose Los Angeles was because it lacks an urban core and it has a constant flow of traffic in all directions. So it was perfect for this. And what this is, it's congestion pricing. The idea is a variable set of tolls and fees that change in price according to the traffic flow. So here's how it works. If you're in a lane where you're charged a flat fee to drive in it. And as the traffic becomes more congested, the price increases if you want to stay in that lane. So the idea is that people won't want to pay the increased price and they'll move out of that lane and it will free up the traffic flow. Now, I know there's no accounting for human behavior. I'm sure a lot of people sit there thinking, I don't want to deal with this. I need to get where I'm going. I'll just put this on a credit card, something like that. The idea is that people will not want to pay more and they'll move out of the way and the traffic will flow more freely. Entertainment is the industry where big data applications being used are either big brother-like or really cool depending on your perspective. The Walt Disney Company, if you've ever been to one of their theme parks, they're all about creating a magical experience for guests. And they have a big data strategy. It's called My Magic Plus and it just takes it in a completely new direction. What My Magic Plus is are rubber wristbands that are encoded with credit card information and other personal data. And if you go to the park and you have the wristband on, it works for everything. Buying food, buying merchandise, obtaining wait times for rides. It works to get a fast pass, which are those timed tickets where you don't have to wait in line to get on a ride. It works for VIP seating for things. It's also your entry ticket into the park and it's even your room key if you're staying at a Disney property hotel. So what they're selling there to adults is convenience because you don't need to bring anything with you when you go to a Disney park if you're wearing that wristband. It's everything. You don't need a credit card. You don't need your wallet. Nothing. Now for children, Disney parks are all about the characters. The characters create a magical experience for children at Disney parks. So previously, the characters interacted with kids in a generic way. So Cinderella might say, hi there little girl. Well, now with my magic plus Cinderella might say, hi there Sophia. I hear you have a birthday today. So a personalized greeting like that for a child is really special. So that's what they're selling as the advantages. But the real winner here is Walt Disney Company because they are going to be able to track visitor behavior in actual real time. They're going to know every move that everyone makes who are wearing these wristbands. They'll know who took a photo with Pluto but spurned goofy. They'll know who bought an ice cream bar shaped like Mickey's head. They're going to know who left the park and went to their hotel pool in the afternoon and then came back and what times they did that. Who went to the daytime parade but not the nighttime parade. Who made restaurant reservations and where and what times. And they're going to use all of that information to tweak their marketing strategy into the future. RUWP is an acronym for are you watching this? And I have to admit I don't watch sports on TV but if I did I would be really into this because if you're are you watching this is a big data application that analyzes streams of sports data to alert people to change the channel to something that matches their interest. So it has an algorithm that rates games based off the number of exciting things going on. So things like lead changes, overtime, a no-hitter, things like that. So you might be sitting there watching a law and order rerun and are you watching this? We'll tell you that your alma mater is going into overtime. So that's the cool thing for users of it. But the other benefiting party are sports cars like Buffalo Wildlings and places like that because they can leverage the service to determine which games to feature and then market promotions based off what's going on. Keyline Cove is an indoor water park in the Chicagoland area and they get 85% of their customers from within a 40 mile radius. But they notice that during the week it was hard to get people to go there. Now I know I'm dating myself here but when I was a kid I would sit next to the radio in the winter and wait for a snow day to be called. And they would not call a snow day until five minutes before you had to be at school at the earliest. So it was like a game day decision. Well now a lot of my friends with kids tell me that they call snow days up to two and three days in advance if they think it's going to snow. People know way ahead of time when there's going to be a snow day called. So the Keyline Cove folks use school system data from Illinois and Wisconsin to find out when snow days are called and then they send out marketing messages to parents really quickly. And they say the system increased their occupancy by 30 to 50%. You don't need to a home brew to appreciate great beer and you don't need to write code or be a computer programmer to work with big data. I'm going to talk about six big data tools that anyone can use and I give credit where credit is due. This comes from GigaOm and a lot of these, to me a big data tool is a predictive tool. A big data tool should be something that you enter data and it gives you a prediction based off of what you've uploaded. A lot of these are visualization tools. So I'm going to focus on the visualization tools first and then get to the truly big data tools. Google Fusion Tables create an interactive map of occurrences like this one. These are Academy Awards by Country. It's very straightforward. Infogram is a little bit different. You can enter data and it will make you a chart and you can choose the kind you want. Spar, Pi, Line, Pictorial, et cetera. So this is a chart made by Infogram. It's Rambo Kills. It's the number of bad guys killed by Rambo. You have the movies down at the bottom, First Flood, First Flood 2, Rambo 3, Shirt On, Shirt Off, and then the totals and so it's pretty straightforward. It's kind of a cool thing. Many eyes is sort of a wordal word cloud visualization tool. You enter text and it produces a graphical representation kind of like this one. Now we've all seen this before. So what sets this as a part is that you can enter a lot of text into many eyes to get this. Several pages of text and it will make something like this. Yeah, Wing is a little more big data like. You upload data. You check the variables of concern and it plots the relationship. So in this one, someone uploaded different colleges, their tuition rates, the total enrollment, the acceptance rate based off the people that apply, the region of the country. And it plots the relationship between all of that. Pablo Public, a little more big data like. It creates comparison charts between two uploaded data sets. This one is sizes of hospitals and the cost per patient of a hospital stay. So I think this person was trying to either prove that it's expensive to stay at a big hospital or it's less money, either one, but the chart showed it was a wash. Big ML, this is really a big data tool. It's a predictive tool and it's composed of four things. You need a source and a data set and then you go to Big ML and choose one of their models and it gives you a prediction based off of that. So just for an example, our source is going to be ratings agencies. So the big ratings agencies, S&P, Moody's and Fitch. And our data set is going to be those agencies ratings for countries from the past. So this is the data set here. How S&P, Moody's and Fitch rated these countries from the past. You go to Big ML and you choose one of their models and it gives you a prediction that looks like this. So you throw that data in there and it makes a chart of predicted ratings agency ratings of countries going into the future. As librarians and information professionals, I don't think any of us went into this field because we wanted to be a computer programmer. We want to be a librarian or information professional. So what we want to focus on in Big Data is results and implications, not statistics, regression, coefficients or logarithms. So how do we do this? You might be sitting there thinking, this is cool. You know, I can go to Walt Disney World and Tinkerbell will know my name, but who cares? How do I really do this as a librarian? So I have a few suggestions here. The first thing I would say is that in your industry, you're not quite sure what types of Big Data projects are being worked on. You should go into the industry specific literature for the industry in which you work and do a search for the phrase Big Data. So for example, if you work at a chemical company, you might want to go to the Journal of Chemical and Engineering News. If you work in management consulting, you might want to look in McKinsey Quarterly, but if you do a search for the phrase Big Data, you'll start to see the types of projects people are working on and then you can start thinking, how do I set myself into this? How could I help with a Big Data project that's going on in my industry? Another way to think of it is when a big vexing problem or issue is out there, couldn't Big Data be used to try to solve that problem? And I have some examples. The stimulus package in the first term of Barack Obama in 2008 that was instituted, which was the stimulus package to try to create jobs and rebound the housing market, strengthen the banks, all those things we like to see in a healthy economy. The minute that was instituted, people started debating, was it not enough money? Was it too much money? Should they have not have done it? Should they have done more? There should be a way to mash up Big Data to tell the exact amount of stimulus that we need to get the economy moving again. Dequestration, these are the across-the-board federal budget cuts for every government agency of 20% that were instituted in March of this year. And nobody knows what kind of effect the sequestration will have going into the future. For example, unfederal workers who are furloughed who are not going to work, they're not buying lunch, they're not commuting, they're not going to businesses near their office. What kind of effect will that have on the economy? Nobody knows at all. Hurricane Sandy, this is a big one. Hurricane Sandy took place last fall along the east coast of the United States. And like so many hurricanes, the real damage from the storm was to inland areas. It was not from the storm, it was from the storm surge and the flooding that occurred afterward. And so immediately people started asking if those sea gates they have in countries like the Netherlands would help in these situations. And what they are, they're steel gates that rise from the ocean floor that protect inland areas and they're very effective. But Mayor Michael Bloomberg of New York City said, well, we can't build those sea gates because they cost 50 billion dollars. And people said, oh, 50 billion dollars, okay, that's too much. Well, people are still living in hotel rooms. It's almost a year later. The rebuilding has not completely taken place. In some areas it hasn't even started yet. The devastation is incredible. And now we found out yesterday it was damaged from Hurricane Sandy to electrical components that caused the boardwalk fires over the weekends that just made it 50 businesses on the Jersey Shore. So maybe 50 billion dollars doesn't seem that expensive in the context of all of the damage. So you would think that the data could be used to mash up the cost of all these rebuilding programs and fixing the damage and the effect on people's lives with the 50 billion dollars to see that maybe an ounce of prevention is worth a pound of cure. It's important to understand the mission of your organization in order to think about what types of big data projects could be used there. And that's hard to know. A lot of times people don't walk around companies saying this is our mission if you choose to accept it. But Mary Ellen Bates had a great suggestion about this. She said, look at what your company is telling their shareholders if it's a public company, that their mission is, what their goals are, what they want to focus on for the next six months to a year. And similarly, what are they telling their customers is something they're focusing on. And those are the types of things that will give you a really good indication of their priorities. And once you know your organization's mission, you can start to think about, hmm, how can I fit myself into that? What services can I provide to help them accomplish their goals and get where they want to be? In big data, there's sort of two push-pull scenarios I think that are important to look at. And the first one is patterns versus predictions. What's just a pattern of behavior and what is something that is actually predicting the future? The flu shot every year, scientists and researchers try to predict the strain of flu that's going to be the most prevalent. And then they manufacture the flu shot serum based off of those guesses. And unfortunately, a lot of times it's wrong. Last year, the flu shot was only effective in 9% of senior citizens in the United States. But TamaFlu is an antiviral flu drug that's highly effective. And there's flu data out there. You may have heard of Google Flu. For a while, that was the end thing and people thought it was really great. But now we found out that last year, they were way off with their data because Google was looking at everything searched for. And they didn't have an application in their big data algorithm to differentiate news articles about flu, articles that were like, do you have the flu or a cold? You know, what should you do to not get the flu versus actual flu outbreaks occurring? So that wasn't very effective. But the Centers for Disease Control have very good flu data. And actually, I was reading yesterday that Kleenex is going to be using CDC flu data, not Google flu data, for this flu season. And they're going to market tissue promotions to people based off where flu outbreaks are occurring. But getting back to my original point, couldn't we ship TamaFlu to places where there are flu outbreaks and get that to people quickly that they could take it to prevent getting the flu rather than just sort of guessing at what kind of flu shot serum we should make that may or may not work. Coincidence versus causation. What is a coincidence and what is one thing causing something else? In Sweden, they have the highest consumption of milk of any country. And they also have very low rates of cancer. So the dairy industry sees this information, put it together, and basically we're saying, drink milk and you won't get cancer. Well, it seems obvious in a way, but maybe it's just because we're librarians to think is milk causing people to get cancer? Or is it that in Sweden they have higher environmental quality, people have better diets, they exercise more, perhaps it's a genetic thing? Who knows? Facebook, when it was first opened up to anyone. And if you saw the movie, The Social Network, you'll know that it used to just be for college students, but then it became available for anyone to join. When it first got going, there were a lot of these IQ tests that were very popular on there, like answer five questions and here's your IQ. Facebook saved all of that data, and they said that people with high IQs like curly fries. And that just seems to beg the question of, do smart people like curly fries or do they live near an Arby's? When you're thinking about data in the context of what we do, it's sort of data science on the other side and data intelligence on our side. We want to focus not so much on big data, but on better data. Because I don't think anyone went into this field because they weren't curious, right? We're all very curious. We want to know the who, what, when, where, why, and how. And sometimes our curiosity even delves into healthy skepticism, and that's a really good thing. Because when people question data and really look at it and are curious about it and its source and what it can be used for and how it was compiled, that's what leads to high quality and that's what we're good at as librarians. The big data communication framework is something that I like to call a template for working with the big data problem. So when you have an issue that you think might be able to be solved or worked on using big data, you can run it through this template. And the first thing that's important to know is the problem. What is it that we're looking at here? What are we trying to solve? Then figure out what can we look at to measure the extent of the problem? What types of inputs are important to notice in the context of the problem? Then look at the data. What types of data are out there? What's the realm of possibilities of data that we can track for this issue? Then when you have that data determining value and some questions to ask are where is this data from, what's the source of the data, who compiled it, is it complete, is it conclusive? And what data within the different packages or boxes of data that we have can be merged together to try to create some kind of solution? I'm a big science fair person, so I would encourage you to formulate a hypothesis regarding a big data problem. So what do you think this data is going to show? What's your gut instinct that the data might show? And then once you decide on that, prove and disprove your own theory. See how you can make it to be real and see how you can discount what you previously assumed to be true and also consider if the change in conditions can affect your assumptions. How could a real wrench thrown into the whole thing affect the idea that you had regarding the problem? Once you have all that together, here's where the real opportunity for us comes in because this is what we're really good at and that's communicating the business impact of the results. We are the people who are good at writing reports upon the analysis and showing what the data shows in a way that's interesting and makes sense to people. And the best way to do that is tell a story. This is Stitch from Mililo and Stitch. I love that movie and I want to go to Disney World and take a picture with Stitch and have that recorded forever by the Disney people that I did that and Stitch will know my name. But I digress. We're good at telling stories in any single time that you can tell a story. It has much more impact and it's much more interesting than just reciting facts and figures. Tell a story and that's what we're good at. And actually there's companies that use big data to write stories. The Chicago Sun Times newspaper writes a lot of their stories with the computer. They just throw in data. It's a big data application that writes the articles for the newspaper. Narrative Science is a big data company that says that they'll just take data and write stories with it for you. And librarians anytime are going to be better at writing a story and telling a story than a computer anytime. Geek-Om had an article about how to get hired as a data scientist. So two things they talked about were storytelling like I just was discussing. Creativity which we're very good at as librarians and information professionals. Those skills are really key and it's something that you can practice writing and telling stories, trying to get creative and build your creativity skills. There's lots of articles and suggestions online for things like that. The two things I wanted to mention, they talk about core competencies. And again, I don't think anyone's expecting a librarian or information professional to write code or really even be interested in being a computer programmer because that's not what we are and that's not where our skills lie. We have other skills. But there's two online learning courses. One is Computer Science 101 from Udacity. And the other one is a machine learning course from Coursera. And it might be a good idea to take these and put that on your resume because when you're talking to people in your organization or in a job interview about a big data job or project, it would be great to be able to say, I am the person who analyzes the data. I'm the person who can help you make sense of this data, question it, determine what its value and help it to lead us forward. But I also understand how the data got to the point where I'm working with it. I took these courses, I understand how to, I understand the way that the statisticians and programmers worked with the data. I understand how they packaged it and compiled it and programmed it. I do have a little background and familiarity with the basic terms and software packages that are used. I took these courses, but I'm the person who's going to take it from there. Once those people do their thing, put the data together, massage it, package it, I'm the person who will help you understand what you're really looking at. So I think that's a great idea. I have the bibliography in my PowerPoint presentation. And if this is of interest to you, Big Data, I just wanted to mention Internet Librarian Conference, which will be at the end of October in Monterey. You can't beat the beautiful setting. I'd encourage you to attend because there'll be a lot of Big Data programs at the Internet Librarian Conference. And actually, I will be doing a session where it'll be an interactive session where we'll have small groups and I'll be handing out problems and challenges and encouraging brainstorming within the groups to try to use that Big Data communications framework to put together a strategy plan for solving a problem using Big Data. So that's the hands-on session I'll be doing. So I would love to see you there. And thank you so much for taking time out of your day to listen to me. I really appreciate it. You're ahead of everybody else just by showing up here for sure. And I'd be happy to take any questions that you have. Thanks, Amy. We appreciate your time that you spent with us today. And again, as Amy's indicated, we do have some time for questions or comments. So if anybody wishes to ask a question, go ahead and grab the microphone or raise your hand and we'll get things going in that regard. Is there a specific class that you'd recommend in the SLIS program is one question from Cara. Thanks. That's a great question. You know, it's been a few years since I went to library school. So maybe Bill, if you wanted to weigh in on classes that you're aware of. But I just want to put in a plug for the basic reference class. I think you can't beat that. I would say that's the most valuable class that I took was just reference because it taught me how to ask the right questions of the requesters to find out what they really need to know. Because what people ask for is usually not what they really need. So the question asking skills I learned in that class were really invaluable. And I use them every day, all day long. And I still remember tips that my professor told me about reference interviewing. So I can't encourage that enough. But I don't know, Bill, if you want to weigh in about any specific class. Yeah, we actually have a new faculty member whose specialty is the big data. So we will be having some courses specific to big data. I suspect they're going to be done either under one of the seminar umbrellas. So I would continue to check the course listings for 281, which is contemporary issues, 282, which is the management seminar. It could be listed under that. Or we have an information science seminar course, which I believe is 287. So keep checking the schedules. And you'll see a number of sections of each of those courses. And then underneath each one, it will tell you what the content of the course is. And then it will say big data in that regard. So we, again, I think I've had some part-time instructors try to approach big data, but we've just hired someone on a full-time basis whose specialty is big data. And she should be bringing those courses online very soon. Okay, great. Thanks. There's another question. Do we need strong math skills to work with big data? Is that mentioned in job openings? I'm sure it is in the initial stage of big data projects. I'm sure for actual big data computer programming jobs, it would be. But as I had said in the presentation, we as information professionals usually jump into big data projects at the middle or towards the end. And because the analysis really is what we're skilled at. So I think for working with big data as far as putting it together and programming it and making the data sets, you probably need strong math skills, but that's not really where our role begins. Our sort of starts after the programmers have put it together. The programmers probably aren't as good at analyzing it as we would be. So I think our analysis skills really come into play when we're working with big data. And I think that's really what's important to this spot. Because obviously, I mean, it's not, I'm not telling you anything. You probably didn't know that we can't really go for a computer programming job against someone with a degree in computer programming and get hired. They're probably going to get the job. But when it comes to analyzing big data, writing reports based off of the big data, and things like that, and determining data quality, that's really where I'm going. And to follow up on that, Amy, I think this is a little bit like what we faced a number of years ago when geographic information systems first came out. And people were wondering whether or not they needed a background or a degree in geography or population studies or something along those lines. And a lot of people with basic MLIS skills and background have been very successful in our GIS specialists now. Right. Right. That's a good point. Here's the question. Are there job titles we should look for? Regarding librarians working with big data. I would, you know, I come from special libraries. I'm going to put in a plug for special libraries. I think the special library job opportunities, they might not say this in the information professional job opening listings that are out there. They might not say anything about big data per se. But I think that it's important if you're going for a reference physician or a knowledge management specialist physician to talk about these types of skills in your interview that you're interested in analyzing big data results and packaging that information in a way that's understandable for higher level executives and things like that. So I think knowledge management jobs probably have a lot of potential for working with big data projects and different corporate library jobs and law firm jobs because now I didn't mention this in the presentation but I have in other presentations I've given law firms are experiencing a lot of pushback regarding the costs up to clients and there are big data initiatives that try to predict the cost of taking a case to trial from client retention stage through verdict or settlement and that's something librarians can work with as well, helping the lawyers determine how much this will cost a client. Are you aware of public or academic libraries using big data in a fashion similar to the examples you cited in the different industries? If financial resources were not an issue, how would you recommend that public and academic libraries use big data? Well, I am not aware of librarians, libraries using the data in this way except for the law firm example that I gave. I've heard about a lot of law firms that are trying to determine how much cases will cost to clients before retaining them as clients to give them an idea of how much they're going to have to spend. But big data in public and academic libraries, hmm, to think about that one for a little while. I don't know, Bill, are you aware of any ideas for academic libraries? I know what, with public libraries, a huge issue right now that I've really, I don't work in a public library but I've read a lot about this is with the Affordable Care Act, public librarians are going to be on the, you know, front lines of helping their clients and patrons and people who come to the library understand how to navigate those health insurance exchanges that the states are setting up and public librarians, I mean, my hat is off to them because they are really going to have to help people navigate these health insurance exchanges and understand what's out there and how they sign up for it and what they should do. So I could see big data being used at public libraries to try to help people figure out which plan they should sign up for because there are going to be a lot of choices and options and I think it's going to be very confusing to figure out what to sign up for. So there should be big data algorithms I would think that are out there that you can plug in some facts about yourself and it will help you determine which Obamacare plan to pick if you're in that situation. That's absolutely correct Amy and a lot of public libraries I've seen in a number of different states seem to be getting to the point where they're going to be on the front lines if you will beginning in another week or so when those exchanges supposedly open up. So that's certainly one potential use coming up. I don't know that any public libraries have done a significant amount with what we call big data now with regard to academic libraries. Again, I'm not entirely sure. Some academic libraries that had strong business programs used what we might consider to be big data, some of the examples that Amy used with regard to information from the rating agencies and things like that. They may have done some stuff to share with their business school students. I would love to see academic libraries, you know, make some contact with their institutional research branches on campus and see what kind of data those organizations have about students, student retention rates and things along those lines and try to figure out ways to use that to enhance the information that those campuses have about their students, their success rate in the classroom which is very important to all those institutions, provost office, president's offices on any campus would be very interested in that. And then, of course, showing how the academic library helps in that regard would do nothing but strengthen the library's position on those particular campuses. So, I think there is potential in almost any type of library for big data applications. You know, one other thing if I can explain that just came to me for academic libraries is I know the return on investment for your degree is starting to become something that's always in the news and it's a big topic, you know, will you, is it worth it or is this sort of a new housing bubble, a college degree? You know, are you never going to get your money back? You're just going to be underwater depending on what you go into. It would be great if academic libraries could have some type of big data initiative that could show from their institution people who've gotten degrees in different subject areas and how much they've made if they could pull people and compile that data and mash it up with the number of people graduating and starting salaries to help people navigate college majors because I know I have 16 nieces and nephews about college age are almost there and it's really hard. I don't know what to tell them. They're really concerned that they're going to just never get ahead and they don't know what to do. So, I think that would be really helpful, something like that. It's just an idea. One thing, someone had a comment here about public libraries not owning data. One thing is that another big data initiative that's out there right now is the city data. I don't know if anyone's heard about this, but a lot of larger cities in the U.S. are opening up all their data to the public and it's sort of just this cloud-sourced app competition where people can access it, anyone can, and try to come up with applications for city data. So, that's something too. I mean, for as far as proprietary data being an obstacle, city data might be a way to start and that might be somebody's PhD dissertation or something looking at metropolitan and municipal big data sets and determining uses for all that data that's out there because the idea is that cities don't really have the money to work with that big data and hire programmers and things like that. So, they're just opening it up to everyone. I know here in Chicago, we're doing that. Any further questions or comments? And again, for those of you who wanted the link to the recording again, both Randy and I have put it in the chat box in additional time. I'll get a message out for SLIS students with the link to the recording. I'll broadcast that through SwissConnect later today or tomorrow. And then we will also do a YouTube version of this and an announcement will go out when that is available as well. So, if you're not a student and want access to the recording and you can't capture what's in the chat box, if you contact me, I'll be happy to get that information to you. And my email address is billbill.fisherfisher at sjsu.edu. Amy, let me thank you again for a wonderful presentation and a good Q&A session afterwards. Let me also give you a little information about the rest of our cochlear series for the fall. And we seem to be following the theme this fall with regard to information-seeking and searching on Wednesday, October 9th at 12 o'clock. We're going to have John Dove, who is a part of the Credo Reference Database Service. And John's going to be talking about the differences between using a commercial database service to find information versus Google and what some of the pluses and minuses are of each of those. And then in November on Monday the 25th, that's the Monday of Thanksgiving week, Virginia Tucker, one of our faculty will be talking to us, giving us some information about her recent dissertation that she did through the SJSU QUT Gateway Program. And her dissertation topic deals with expert searching. So again, we have this theme this fall about using information and finding information and things that come in regard under that umbrella. So again, thank you, Amy, and thank all of you for participating and being with us this afternoon.