 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining the current installment of the Monthly DataVersity Smart Data Webinar Series with Adrienne Bowles. Today Adrienne will discuss machine learning case studies. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, you'll be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights of questions via Twitter using hashtag smartdata. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right-hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now, let me introduce to you our series speaker for today, Adrienne Bowles. Adrienne is an industry analyst and recovering academic providing research and advisory services for buyers, sellers, and investors in emerging technology markets. His coverage areas include cognitive computing, big data analytics, the Internet of Things, and cloud computing. Adrienne co-authored Cognitive Computing and Big Data Analytics, published by Wiley in 2015, and is currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his BA in Psychology and MS in Computer Science from SUNY Binghamton and his PhD in Computer Science from Northwestern University. And with that, I will give the floor to Adrienne to get today's webinar started. Hello and welcome. Thank you, Shannon. It's another beautiful day here in Connecticut. And I hope that the audience, wherever you are, is having a good day and good weather. One of the things you'd like to do with this series, you know, I was just thinking about it. It's actually about a year and a half into the series now. One of the questions that I often get revolves around how people are using some of these technologies. And so when we're coming up with topics for this year, I thought it'd be good to pick a couple of industries and do some case studies or use cases to show different ways that people are already exploiting machines learning and perhaps get people thinking about new ways to use machine learning in their own environment. So with that in mind, we're going to look at three industries today, insurance, pharma, and healthcare. And take a look at really some of the advances. I mean, things have changed a lot in the last few years. If you were to look at the landscape in the business press or the popular press five years ago, you certainly wouldn't have been inundated as you are today with stories about machine learning and artificial intelligence and the robots are coming and all the other good stuff. So let's take a look and try and separate the fact from the fiction and get an idea of where people are already using technology. But more importantly, perhaps how we see it going in the next couple of years. So with that in mind, the agenda is I'm going to talk a little bit about some of the foundations for industry-specific machine learning applications, what you should be looking for, what are the attributes of problems that make them well suited to machine learning. And then we'll look at the three industries in turn, insurance, pharma, and healthcare for each. I want to look at an application or two. The types of applications are suitable. And then an example. And finally, we'll look at how industries are being transformed or will be transformed with machine learning in the next few years. So to start out, why did I pick insurance, pharma, and healthcare? What do they have in common? Well, first of all, they have a lot of data. And one thing that we'll see and we'll come back to, we've talked about this in previous webinars, is that the way artificial intelligence is maturing, there's much more of a focus on data relative to algorithms than there was five or ten years ago, certainly. So you have to have data, and these industries all have it. They've got historical data. They've got things that are highly structured, like customer databases, taxonomies, et cetera, where you can just go look stuff up and they're in very structured, repeatable form. And then they have, in general, for each of them, different types of data. But they have stuff that's, I call D structure, which is popularly known as unstructured. I don't think data can actually be unstructured or wouldn't be useful. They just take more work to find this structure. So if you're dealing with natural language, in journals, in texts, in case notes, it could be audio files or video files that need to be processed to find this structure and then find the meaning in there. But each of these industries has a lot of it. The other type of data that we've talked about recently in one of the webinars is streaming data. And what we're going to look at today is the impact of integrating streaming data, which is data in motion that's going to be analyzed in motion, with the historical data, to start to derive some insights. And streaming data relative to the insurance industry and pharma and healthcare could include things like telematics when you have a fully instrumented car that's reporting on its condition, weather information, things that are changing constantly, biometrics. Let's say we have sensors, you've got a Fitbit, you have some sort of a device that's producing data about you, and even news feeds. And we'll see as we pull all this together towards the end how each of these may be used as input to a machine learning system to give you some insights to redirect behavior. What they also have in common, at least through the industry, is relatively well-defined vocabularies, data models, and taxonomies. So if you were to look at any aspect of insurance, auto property, P&C property and casualty, life insurance, business insurance, there's generally a set of claim codes that you can interchange, people understand some of the basics there, and that makes it easier for us to look at opportunities for leveraging this data. So in pharma and healthcare, the basics of biology and chemistry and biochemistry and neurophysiology are all the representations are all pretty standard. And finally, there are regulatory issues. So before we get into these industries, it is subject to a number of regulations, whether it be regional, state, nation, international. And that can sometimes impact the types of analysis that we do and the type of audit trail that we need to have in order to meet the regulatory issues or regulatory requirements. So before we get into the specific cases, I wanted to bring this diagram up for folks that weren't with us a couple of months ago when I talked about the changes in machine learning. And here what I'm just showing is over time, and the extreme left would be, let's say the 1950s, 1960s early on in machine learning up to the current day, the relative importance or emphasis on algorithms and rules versus data. And so things have shifted. We used to try and make, but all the smarts, if you will, in the algorithms. So things would be heavily rule-based, and we would try and solve problems using a minimum of data because, frankly, that's what we had access to. And so the systems themselves reflected the intelligence or the belief system, if you will, of the people that were deciding the systems. And what we're getting to today is more and more systems where we're trusting the data to provide relationships and patterns that will explain the behavior or at least provide the insights that don't need to be pre-programmed. So that's kind of the shift, if you will. One last slide just to do a level set of folks that are interested in the industries, perhaps, but haven't been with us when we talked about machine learning. One of the concepts that's very important to understanding how these things are being applied in the three industries we're looking at today is the idea of deep learning. And we talked about deep learning. We're talking about a system, normally one that any machine learning system is one that will improve its performance based on experience with data. But in particular, when we talk about deep learning, we're talking about a system that has multiple levels or layers, if you will, where we start out with an input level that represents the variables that we can actually observe. So in this example, this is a deep learning system to identify objects in a picture. You start out by trying to identify one level of features and face looking for edges, and you might be looking for edges in a picture by looking at pixel by pixel, trying to find things that are the same brightness or the same contrast between that picture. And it's surrounding pixels or the same color as it's represented digitally. And gradually you go from one layer into something that you just passed now, the edges. Now we try to identify, okay, we have all these edges identified at the next level. What type of shape are we looking for? And so we have the rules there that help us to identify geometrically what sort of shapes and then get into further and further refinements. You can have an arbitrary number of levels. And in some of the complex systems, one that Microsoft has reported on recently that they do for speech recognition or image recognition at 150 layers. But you always have an input layer and an output layer, and the layers in the middle are considered to be hidden. And the reason that's important is you can tune the algorithms, looking at different sorts of activation based on your preconceived notions, your beliefs about the rules that you would apply to get from one type of feature to another. But you don't actually see when the system is operational what's going on in the middle. It tends to, you know, for any sufficiently complex system, that tends to become a black box. So with that as background, let's take a look at how this works. So we're going to start with insurance. And for each of the three industries we'll start by looking at what are the major use cases that are particular to that industry. So virtually any business that wants to look at machine learning will be looking at it for things like marketing, sales, communication, operations, perhaps a supply chain, optimization. But today we're just going to look at things that are particular to the specific industry. So for insurance, speaking broadly, looking at auto, property and casualty in life, as I say, there are other categories of insurance and most of these lessons will be applicable there. But it's basically, I mean, the insurance industry is about risk management. So we want to be able to understand, predict for an individual that's getting a policy or a business that's getting a policy. What's the likelihood that there will be a claim? What's the likelihood that if there is a claim it'll be serious? Let's look at improving the customer experience and maybe looking at some dynamic approaches to this pricing. And also for whenever you're dealing with money one of the issues is fraud detection and that's of particular importance in insurance. One of my friends was just involved in an accident this week where someone hit them and it was done to be able to file a false claim. So that's a big part of insurance and being able to predict that will certainly improve performance. On the second column, what we want to look at, thinking back in terms of that diagram with the importance of data, just what type of data do we have available to us for the industry? So for insurance we've certainly got customer data. We have demographics and historical record. The company that holds my life insurance policy has certain information about me. I have medical information. The company that has my automobile insurance has my driving record. The company that has the homeowner's policy has different information. But each of them has information about me and how I've interacted with insurance companies in the past. Now if you're insuring property, which would include, let's say you're doing an automobile policy, now you need to know about the person that's insuring it. You also need to know about the property itself. So data about that and some local regional data to put it in context. So insuring a three bedroom house that's 2,000 square feet in one neighborhood is going to have perhaps different rules associated with it or different relationships, not necessarily formal rules. Then another location. You also, there is data that's streaming whether or not it's collected. And so this can be things like real time or near real time personal data if you're insuring a person. If you could track how they are behaving, you might want to change the relationship that you have with them from an insurer's standpoint. That could be anything from seeing a sudden change in the sentiment of social media posts that they're putting out to biometrics. Again, if I'm wearing a Fitbit and you can detect that my heart rate is changing, you might want to look at other things. But you might also be interested in looking at the impact of different news items. There's a fire within 50 miles of me. What does that mean in terms of my risk as a policy holder? There's change in the weather. Should there be a change in the policy? And when I talk about this, I want to make sure that we understand that this is the streaming part, not the regional data for the property itself. That's like the difference between weather and climate. We know what the climate is. We know what is expected. But there are always things that are changing during the course of the day, the week, the month, etc. And those could be used to change pricing. They could be used to help direct behavior. So let's look at what's being done today. And then as we go through some of the other examples, we'll come back and see how we might change these for the future to take a better advantage of the data that we have. So our first look is a proof of concept for auto insurance pricing. This is from AXA Insurance. And for each of these, there's usually like an online reference. And when you get the slides next week or the link to them, there'll be references so you can see where the numbers are coming from. So in the case of AXA, their historical data tells them that somewhere between 7% and 10% of their insured drivers, the people that have policies with them, will cause an accident annually. And if you think about auto insurance, there's a difference between causing the accident and being in an accident. So at this point, they're looking at it and saying, okay, this is the percentage of drivers that are likely to cost this money because they're at fault. However, only about 1% of the accidents result in a payout that's over $10,000. So clearly those are the policy holders that you want to scrutinize more carefully, perhaps charge more, perhaps offer education. There's a lot of things that you would want to do with that information. If you could do a better job of predicting which of the people that you insure is most likely to have an accident, that's a high payout. So the challenge was how to use technology to improve the predictive ability of their systems over the current methods. In this case, this is one that's been used recently by Google as a reference account, if you will. So they're using machine loading in the Google environment. And the way they did it was to start by identifying 70 risk factors, roughly 70. And those would include things that you might normally think of, the age of the insured. And remember, this is auto insurance, the address. And the address may be factored in because it's going to give you a higher rate or a higher probability. It's a denser region. We know that most accidents happen within a few miles of home. So if your address is Manhattan, you may be more likely to be involved in an accident than if your address is in Montana, as an example. The vehicle type. Previous accidents, your history. So again, this is the historical data. The original channel, how did you buy the insurance? The age of the car. The total is, say, of about 70 different factors. Now, if you think about how these things have been priced in the past, a lot of the pricing algorithms have used some number of risk factors, likely not quite 70. But for each of them, there would be a weight assign. People would look at the historical data and try and determine what was correlation, what was causation. And how these things go together and place it based on a set of rules. What they did in this example was to use TensorFlow on the Google Cloud and a machine learning engine. And going back to that previous diagram, only now it's left to right instead of top to bottom, they built a machine learning solution that took as input those 70 variables. And for practical purposes, we're going to give this a 70 dimension input vector going into a fully connected neural network. Now, I mentioned that the Microsoft example, when we were looking at translation and speech, was 150 layers. In this case, we were only looking at three hidden layers, and that was enough for them to get a resolution at a performance level that I'll talk about in just a minute. But basically, it's fully connected, so they're looking at the relationship of all these dimensions to each other in a convolutional neural network, which is a feedforward network. We don't need to get too deep in the weeds on this, but if people are interested, we can follow up afterwards and talk about more of the technology that was used. But we go in and go from that 70, look at the relationships, which are impacting it, and get down to the very end, or basically it puts a risk factor associated with that instance. So if you think, you know, I've got a million policy holders, then you would have to run this for a million rows in a database, but each one of those would have the 70 risk factors that gets analyzed. We look for the features, go through it, those three steps. And what comes out, you may not even know at this point the three levels. You can probably work it out, but the relationship between, let's say, the vehicle type and the original channel may be marginal, but that may be something that has a significant influence. So we take this and their resultants' proof of concept was that they were able to predict with about 78% accuracy the high-risk policy holders. And this is done by training the system using historical data. So you put it in as if you didn't know what had happened in going through and then predicting. But since it's a proof of concept here, it's not operational. They could look at it and see, based on the data that they had, whether or not they would have been able to accurately predict, and the result was about 78%, which is higher than their current method, and also significantly higher than the first AI method they tried, which is a random forest approach. So being able to identify an appropriate set, I won't say the right set of risk factors, because perhaps they could do 80% of this with 50% of the variables. I don't have that data available. But they identified a reasonable working set of input variables that gave them the performance they need. They can optimize their cost and, based on this, offer to their clients new services. We're going to move through the industries fairly rapidly now, but we're going to come back to insurance, because that's one where maybe we can, if we have time, get a bit of a discussion going. But that's the first example using a deep learning approach. So now in pharma, and in pharma we're talking about pharmaceutical research here. So the basic use cases that we want to look at, drug discovery, identifying new drugs, clinical trials, the process of assigning people to trials so that we get the right population, a meaningful population, and then basic biochemical analysis. So the data that we have available includes, as I said, we know certain biochemical principles, those don't generally change. We may have new discoveries about the relationship between certain structures. So we may get updated. They find that something that we've believed in the past is incorrect. That happens in science. There's not really much in science that's completely settled. But we have that information. So we can model molecular structure in the interrelationships. We can look at clinical trial case data so we can see, every time we do something, we can record the outcomes so we can use that historical data in future tests. And of course journals, news, and other forms of deeply structured natural language data that we would like to include as we're trying to either discover new drug, do the assignment of people to the trials, or analyze the results. Now, I have this slide just to show kind of the scale of the problem or the opportunity. I was trying to be an optimist here. So this is a current report, 2017. And right now there are over 240 immuno-oncology treatments being developed for people with cancer. So if you think of those 240 different treatments, to get to that, there was a much larger funnel of things that were being considered and then rejected. Anything that will help the pharma companies to identify candidate compounds more efficiently is going to speed the process. And so that's what we want to look at today. The one approach here that, again, you'll get all these references afterwards, but this is for machine learning, space optimization for drug discovery. And here the problem that they were working with is virtual screening. So you have all this data about different components, ingredients, if you will, and their chemical properties and things that might be combined. But even with the availability of high-speed computers and clusters that can operate certainly much faster every year, year after year, it's still a computationally intensive problem and virtually intractable to try every combination. So what you want to be able to do is to weed out things that you know are going to be a conflict as quickly as possible. And this is one approach that was a publication data on this. So 2013, so just a few years ago, looked using machine learning, a similar type of system that I mentioned before. Only here it's a combination of support vector machines, neural networks, and random forests, where it's a hybrid approach using three different machine learning approaches, if you will, to determine when there were known conflicts in the chemistry, the biochemistry. And their results on this, again, using three different approaches to machine learning was that they improved the throughput. They were able to identify things that needed to be discarded by 50% and they had a 90% accuracy. So they weren't discarding things that would actually could have been promising because that's another certainly a false positive is costly. So that's one approach. This one was very recent. It was February of 2017. And in this case, it's University of Toronto looking at accelerating the discovery of drugs with machine learning. And here they've developed some new algorithms that are fundamentally based on three-dimensional modeling of the molecular structure, the protein molecules, in basically building a mathematical model for structural biology, if you will. And I don't claim to be an authority on this by any means, but by going to a 3D model that they could then map from that model as input, as an input vector to a machine learning model, they could find out relationships that were going to work and those that weren't going to work. And by finding things, again, looking for things that weren't going to work and discarding those earlier on, this is something that saved time and time is money. It also means you can bring the compound to market earlier. And there are lots of business and societal benefits to that, certainly. So that's one from this year. Here's another one, April, getting me closer. And in this case, it's using deep learning, the multi-level with the hidden layers that I talked about, for drug design, drug research. And in this case, it was the goal of this research was to be able to identify potential solutions using smaller data sets. So in some cases we have lots and lots of data, and as I showed in one of the earlier slides, we can sort of brute force, find inferences based on looking at lots and lots of combinations of data. But sometimes when you don't have it, we need to be maybe a little more clever, I don't want to say smarter, in terms of the algorithms that we use to identify potential or reject things that are going to fail later on. And so this is Stanford, just a couple of months ago they had this published, is an approach using a smaller data set but with deep learning. So that's pretty recent in the pharma space. And then we have, just a few weeks ago, IBM patenting a machine learning model or machine learning models for drug discovery. And you'll have this reference, as I said, in the package when you get it next week. But here what they've done is done a business process or a method patent, if you will, for the approach they're using to identify potential drugs for specific disorders or diseases. And this was just also April of this year, just a few days after the Stanford announcement. So it remains to be seen, I have to look at this in a little more detail, but the actual impact of this on drug discovery. But it's interesting that it's advanced far enough at this point that they applied for and rented a patent for a machine learning model to identify the chemical compounds. Now we're going to take a look at healthcare. And again, some of the use cases, certainly one for healthcare, general health, public health, is epidemic or communicable disease prediction and monitoring. And then the ones that we typically think of when we say we want to improve healthcare, diagnosis and treatment options. So the types of data that we're going to be looking at are patient data, treatment or outcome data, and I'll get into the streaming in just a second. But if we look at that sort of the two ends of it, the demographics and the historical record, the electronic record, this is for each individual patient. So it's the personal information and the treatment and outcome data is the data that we have about these general populations. So we know about drug interactions, we know about case studies, we know someone with the particular symptoms, the individual patient that we're looking at now, 80% of the time that indicated some particular disorder. So that's the general set of symptoms and we want to try and match that. All right. But I want people to also think about the idea of streaming data. So what can we do with data that we're getting right now? So besides the historical data and besides the case data, is there an impact? Is there something that we can learn that has predicted value based on the personal state behavior? And that could also be things like your biometrics, your blood pressure, your heart rate, all the things that you can actually check from an attached or approximate sensor device. But it could also be that those things can be inferred from certain behavior. So we're going to see as we pull all of this together at the end, if I'm taking a look at you, if I'm your personal physician, it's one thing that you come in and you get a regular checkup. It's another thing that you come in with a specific complaint. You think you have a disorder, you have some symptoms, you want to figure it out. If we take medicine in the future, and let's assume everything is opt-in for the moment, so it's not mandated, but you can say, well, if I'm your doctor and I'm monitoring this or I'm monitoring all my patients with a visual dashboard and I see that there's some incident that is normally associated with road rage, then I know that there's going to be some biometric changes and that may change your condition. So we'd like to be able to include analysis of those things. So let's take a look at a couple of the issues. One is that particularly when we get it to healthcare, it's one thing to say, all right, I have all this data and based on what I know, I'm not really sure how the algorithm has come up with this, but you are a high risk. You can choose to go to another insurance company at that time. But if I'm going to be making a diagnosis about your condition, and I say, I can't really tell you how these factors work together, but it looks like you're going to have a heart attack, well, that's something that there's a lot of regulatory issues about how you do diagnosis. But today what we're seeing is that the explanation of how you arrived at a recommendation is not as important as the efficacy or the ability to achieve the desired results. And the regulatory bodies are starting to look at this. And a couple of examples that are sort of well documented, things like aspirin. We know that there's an improvement in positive outcomes or a reduction in negative outcomes by people in certain categories who take an 81 milligram aspirin every day. It was decades before the medical community understood the relationship, but it didn't stop people from beginning that regimen. Lithium uses a mood stabilizer as well as a component in batteries. It's still not completely understood in terms of the pharmacological benefits, but it is prescribed because the statistical relationship between the ingestion at a certain level and the reduction in symptoms is well documented. So the efficacy is there, and if you can show that, then we're not going to be as concerned as you might think that we can't go through and explain step by step what the relationships are. So that's why I wanted to just point this out. This is a fairly recent article in the MIT Technology Review. So with that in mind, understanding that what we want to be able to do is to demonstrate the relationship, even if we don't have all the causal data, start by looking at infectious disease management. And the two figures here, the first one is John Snow's 1854 map of cholera death clusters in London, and the other is December 15th, a prediction model for malaria using machine learning. And the fundamental difference between the two is that although John Snow's map was very effective in helping people to visualize the issue in 1854, they still didn't have a good germ theory of cholera transmission. But by being able to document and do this type of analysis, looking at the clusters of where cases, where fatalities were occurring, they were able to narrow it down and find the actual water pump that was responsible for poisoning the population. But it wasn't predictive. It was descriptive. And now using the same sort of concept but with machine learning, we're at the point where we're starting to be able to predict transmission. It certainly helps that we know more about how malaria is transmitted than we did about how cholera was transmitted in 1854. But it's kind of taking the same idea and now applying this machine learning approach to it. So now let's take a look at the overall market, if you will. And the only reason I included this one is that, first of all, it's a good article looking at the different approaches for neural nets to do prediction and prognosis and cancer. But the other thing is it's now 10 years old. And for a lot of people that are just getting into this and seeing all the hype, it may be almost hard to remember that this has been going on under the application of machine learning and deep learning to cancer diagnosis for well over a decade. And so what we want to look at today is some of the stuff that's going on right now and that's being implemented and will be brought into the future for prediction diagnosis using machine learning. So let's look at a specific example that's getting a lot of traction. And we're going to look at diabetic retinopathy. Just to put this in context, we're talking here about retinal damage caused by diabetes that can ultimately lead to blindness and it's a significant problem. Obviously diabetes as a condition is growing. We have certainly a big issue in the U.S. and I know that our audience for the webinars in many countries, the increase in wealth is associated with a decrease in some of the health indicators that are associated with diabetes. So it's an international issue. But in the U.S., it's the leading cause of blindness for people aged 20 to 64 and it's been estimated that at least 90% of the cases could be reduced or eliminated with proper treatment and monitoring. If you've had diabetes for over 20 years, up to 80% of those people are going to have diabetic retinopathy issues. So the other part of it is that there's often no early warning signs. It's not something like you're going to go in and say that you're getting blurry vision. In the first stage, there could be no symptoms. So we need to be able to identify it faster and more economically. And I will admit that when I was putting these slides together this week, every time I read it, I would start to get psychosomatic injury or condition and get blurry vision, but I don't think that this is a personal issue for me yet. Here it is in terms of the actual research. And the reason I specifically looked at the diabetic retinopathy besides the fact that it's a huge and growing public health issue is that so much is being done about it. So Google and Verily, which is a Google company under the alphabet umbrella of the University of Texas at Austin University of California and a number of other groups on in India and Brigham Women's Hospital in Boston, where I spent a lot of time, are all collaborating to develop and validate a specific algorithm for diabetic retinopathy by looking at retinal fundus photographs. And the retinal fundus photographs are looking at the, basically looking inside the eye for abnormalities or looking for patterns that indicate ruptured blood vessels. So that's a research project that was announced at the end of last year, looking at the specific type of neural net for image classification. And this one's also a convolutional neural network, I feed forward. So it's similar to what the technology that was actually used for the AXA system. But here they used over 128,000 images, and then they had 54 professionals, licensed ophthalmologists, and residents look at each of those and go through to do the training data. And so this is an active approach that's being used right now, and it ties in with the next one, which is Verily working with Nikon, and Nikon doing the optical imaging. This collaboration to improve the accessibility of the screening using this deep neural network or deep learning, deep machine learning approach to the retinal imaging. So this is one approach. The reason that I included this one under healthcare is that I have three examples of different groups, all well known names. We have Google, now I have Microsoft. Microsoft is partnering with an I Institute in India to launch the Microsoft Intelligent Network for ICARE. Similar approach that they're looking to build the large data set, but also to make it accessible using machine learning to advance ICARE globally. And finally, within the retinopathy, IBM also this year has published a medical imaging solution that looks at the diabetic retinopathy images, the final images, pixel by pixel. And you can read the full paper yourself, but basically it takes about 20 seconds to get an accuracy of 86% in classifying the disease, as well as a human expert. And so once they can do this and rule it out internationally between the combination of Google and IBM and Microsoft, each working on this. Hopefully we'll get some data sharing. This is really going to revolutionize that part of healthcare and democratize it. Because if they had enough machinery to do the scans of everyone who is a moderate to high risk candidate, they wouldn't have enough ophthalmologists in the world to be able to look at those scans. So this is truly one of the cases where machine learning can democratize, if you will, that aspect of medicine that causes blindness in so many people internationally. So now I want to look at using machine learning for what's called precision medicine and how that's going to transform healthcare and pharma as they go together. So precision medicine really refers to medical solutions that are tailored to the individual. And a lot of times what you may be seeing if you're not in healthcare or pharma, you'll see that people are excited in business about machine learning because it allows us to create very customized marketing messages. You know, you go on Facebook and as you're reading through, you get an ad and you think, wow, how did they know that I was interested in a Philsom hat from Seattle? Well, they know because they've put all this data together. But if we can apply the same type of hyper personalization using contextual data to provide medical and pharmacological solutions that are just right for just you at just the right time, that's really going to transform medicine. And that's where a lot of this is going. This happens to be from the MIT precision medicine group, but it's changing the way we approach medical care to make it personalized but to make it data-driven. And the way you do that is with machine learning. So once again, I want to bring out the work, the final work that's being done by some of the larger companies. And there are some smaller ones that are involved in this too, but really in the interest of time I'm just going to mention a couple of the bigger ones. So here in Microsoft with Project Hanover, which is AI focused on precision medicine, and they're looking at it with machine learning for cancer decision support, chronic disease management. It's all of the use cases that I mentioned, but it's trying to do it at the individual level. So it's taking into account more personalized data. The other one is that Microsoft is working in partnership with some of the universities to integrate their cloud, AI research with the medical community. Data sets to once again democratize, if you will, healthcare improvements. So that's Microsoft. Google, just as we saw in the specific example of retinopathy, Google acquired a company called DeepMind not long ago, and DeepMind is involved in some clinical work here. They have focused on collaboration with DeepMind. It was actually a UK-based company before it was acquired by Google. And they're working with the National Health Service in the UK to try and improve quality of care and access and diagnosis using the DeepMind machine learning solutions. So that's more of a regional focus, although obviously it's something that they would expect to roll out beyond that. And finally, in the transforming in healthcare, IBM, in general, when you look at IBM today, Watson, they're just several of the different classes of problems that Watson is being used for in healthcare and pharma. Genomics, drug discovery, and oncology are some of the ones that have been well publicized, but they also have Watson care manager and patient engagement. So they're using the Watson umbrella, which fortunately or unfortunately now includes a lot of things beyond the original cognitive system. It was a deep question answering system, but it includes a number of machine learning algorithms and predictive analytics that are being applied throughout healthcare and pharma. And the last one that I'm going to mention in this section from the transformation is the genomic sequencing service, and there's really not much that's more personalized than your human genome being mapped. And so this is a project between IBM Watson or the Watson Group in Quest Diagnosis to use data from Memorial Sloan Kettering to create datasets that would be useful for personalized medicine. So as we nearly ended the hour, what I wanted to do was come back to insurance and say, now that we've seen a specific example, one way of looking at a 70-dimension vector to do pricing, how could we integrate the types of data that we have from our everything that's being collected about us, frankly, and change the way we do insurance because insurance is related to healthcare, even if you're talking automobile insurance, it's related to healthcare. And so I wanted to leave you with some thoughts on how that fits together. I will say that some of these examples I've actually taken from a course I taught on analytics and IT at NYU Stern back in the year 2000. So some of this goes back a very long time, and we're just now getting to the point where it's fairly practical. So if we take the different data sources, and again we're looking at the insurance issue, the historical, the streaming, how can we enable new business models where the pricing reflects the actual behavior at a given point in time? And so here are my three examples for auto. What if you're driving and you put in your GPS, your destination, and Waze communicates with your insurance company, and you get a message that says, you know, the weather on the route that you're planning to take from Connecticut to Boston, there's a storm, there's a lot of things going on, and we're going to price your policy based on the fact that you're taking the risk that we don't think is prudent, and then it asks you whether or not you want to proceed. All of that is feasible today. We have that information. We have the information about your car. We have the information about your driving. And now if you say, well, you're going to take a route that we think increases your risk above the profile, we could do dynamic pricing right then. You could make the decision. You would know that making that drop is going to change your rate. And that could be tied into your guidance system. It could be that you have to click on it and approve life insurance. Get a message. You're in the store. It knows where you are. We have geospatial tracking. There's an RFID tag or other identifier on the ice cream. Now you're in the ice cream section. Get a message. Hey, you know, if you take that ice cream, it's going to change the glucose monitor reading next time we check it, which is opt-in perhaps, but based on what we know about your own biometrics, we know that we're going to be checking that. And when that goes up, your life insurance policy is going to be adjusted. Now you get the option. Do you want to see what the root cost is of the ice cream? Because it's not just the ice cream. That ice cream is going to change your blood sugar level and that's going to change your life insurance policy. Finally, property and casualty. We can have a system that looks at the news and categorizes different types of news events. We can look and see that there are, let's say, you know, we're in California where everything seems to go on the ballot. Proposition 21, I'm just making that up. I don't know what Proposition 21 is this year, but let's say you're getting a note from your insurance company that they've determined that if this proposition passes, your risk of flooding is going to rise. And based on that, your policy is going to be adjusted. So you're getting advice from your insurance company about politics. It may sound far-fetched, but it really isn't. All of this is possible or certainly feasible and possible, but not necessarily practical. But we're only talking a matter of months in a couple of years before each of these scenarios could be instituted within the insurance industry. That would change the way we do everything. Now, a lot of people are going to be thinking, well, I don't really want the industry to change. But the fact is, as businesses have the ability to offer these types of services based on behavior, it's going to happen. I was speaking to the company this week, one of our clients, who was looking at how to give more accurate information to their users to present them with the right content at the right time to do something. And I said, well, one of the things that you can do today is if you can get your customer's customer to opt in, when they're looking at the content that you're showing them, you can monitor the person's reaction. We can have a camera on the laptop or on the tablet, if you will, that records the emotion as somebody reacting positively or negatively. You could, of course, have the person who's using the application give feedback, but usually it's more accurate if it's done passively rather than actively. It's the kind of thing that we can have in your car. We can look at it and say, hmm, okay, we're already using cameras in some cases to see if somebody is nodding off based on the movement of their head. You start to look at it and you have your system that's looking around, bringing in information about the weather, bringing information about traffic, bringing information from your root guidance system, and then it checks and sees that the sentiment that you're exhibiting based on your facial expression is anger. Oh, maybe your insurance company is going to tell you to calm down. A lot of these things are so close right now and enabled by the availability of data and the availability of machine learning. And with that, I think we're going to turn it back to Shannon, just have a couple of minutes for questions. But as always, I'd be happy to talk to anybody offline after the event or after the webinar. And here's how to reach me. I would just point out, again, as I try to do all the time, if you want to connect on LinkedIn, just send me a message and say that this is how you heard about me because I tend to ignore it otherwise. Shannon, take it away. Andrea, thank you so much for another great presentation. Just a reminder to everybody to answer the most commonly asked questions, I will be sending a follow-up email by end of day Monday with links to the recording and links to the slide. If you have any questions, feel free to send them in the Q&A section there. And I hope you all can join us as Adrienne's got up there in the next webinar in August 10th, Organizing Data and Knowledge. Excuse me, in July. Hi. Let me go. I know I'm looking at... I'm already planning way ahead. You know, what month are we in? So next month in July, Advances in Natural Language Processing. But everyone's so quiet. No questions today. That's unusual. There's lots of new insurance policies. Indeed. Well, thanks as always to all of our attendees. And Adrienne, thank you so much as always. And we will hope to see you all next month. I hope everyone has a great day. Thanks. Take care, folks.