 Good morning. My name is John Ortega. I will be covering how big data works in the medical domain and how companies and others use patient encounters with machine learning to do various tasks like predict patient outcomes and report on insurance billing codes. First, I'm going to tell you a little bit about myself. For the past 15 years, I've been working in software development and research. Currently, I'm working on what's called clinical language understanding in the professional setting. However, I also work on various projects, both in machine translation and natural language processing. I work as both a researcher and adjunct instructor at New York University in New York, New York. I also work for the biggest natural language processing employer in the world, nuanced communications. My days are generally filled with topics ranging from how to diagnose a patient with a particular disease to how to translate from English to Spanish without the help of a human. I changed my career path about eight years ago from a software architect and consultant to a full-time researcher dedicated to computational linguistics. Important note, the stuff that I'm going to present today is not by any means part of the company that I represent, so I'm basically going to give an overview of medical report coding and big data. Today, I will be talking about how big data and machine learning can be used to achieve various objectives in the medical domain. I will first talk about how medical report coding is used as a billing technique for the hospital and the patient. Then I will discuss specifics about how projects are including big data, along with complex algorithms to provide state-of-the-art services that may seem futuristic. Lastly, I will talk about how all of this is used to improve the medical world and what you would normally experience when you go to the doctor. Medical report coding generally involves a coder that oftentimes works from home. The coder is a person well-versed in clinical language and is able to judge from the text written on a clinical report what codes belong where. Insurance companies almost always bill using the coder's disposition. In modern times, software recommends codes of sorts to help the coder do their job more efficiently. Codes in the United States are part of a bigger medical terminology system that is agreed upon in advance. The examples in the slides are part of the most widely used system called the ICD-10 system, which is used for coding. In order to better understand how coding works, let's go through a typical medical coding pipeline. First, a patient will enter a clinical facility via an ER or through an outpatient facility. Once a doctor sees them, the doctor will produce a diagnosis of some sort and either ask that a procedure be done or send the patient home. Each of the steps in the pipeline are considered patient encounters or reports. In this pipeline, if we assume that this is the patient's first visit, four reports are created by the hospital. Each of the four reports or encounters are coded accordingly by the medical report coder. Once the patient is at home, the part we all hate happens. A bill is received showing all the codes that occurred in each encounter. In other systems, like the European systems, a patient wouldn't receive the summary. Rather, another billing agency, let's call it the government, would. From the Center for Disease Control, reports on how many visits are made as quite alarming. Billions of visits by patients in the U.S. alone are made. The main visits are done on an outpatient basis by visiting the hospital or the physician's office. However, emergency room visits make up millions of visits also. It's interesting to note that for the most part, United States medical facilities use some type of software to manage their patient records. A lot of those medical records end up in the hand of a coder. However, coders, much like doctors in the U.S. and everyone in the U.S., are quite busy. So, many companies are attempting to aid them by providing computer-assisted coding, also called CAC tools, to increase their productivity. And thus, permit the coders to handle more reports in a shorter amount of time. Enterprise-level software is needed to cover a facility's needs. At a whole, millions upon millions of records are available. And, as long as the software company is compliant with industry privacy standards like HIPAA, it can take advantage of several facilities reports to provide predictive services, like diagnosis, coding, and more. The facilities tools typically will integrate easily into APIs provided by the software company. APIs that are commonly found and the tools that are created by the U.S. companies are speech-recognition tools, devices, predictive services, security tasks, and finally, online tools. All of the API tools help any facility take advantage of big data processing that the software company can provide. Software that is backed by millions of records can be used to increase traditional coders' performance. Coders complete more documents and at less amount of time using CAC tools that enable them to select among a ranked coding list of having to originate the codes themselves. Ranking of codes is done using machine learning algorithms. The algorithms are based on deep learning and other statistical methods. Clinical data is somewhat noisier than other data, such as movie data. So, it's quite an achievement when enterprises can create CAC tools that are able to produce accurate customer reports and outcomes showing that the productivity can be increased. Coder agreement amongst trained teams is generally negligible, making the entire process for software researchers and developers a whole lot easier. To better understand the advantage of using big data for prediction in an electronic health record, or EHR system, I present a simple, fictitious, clinical record. The patient, John Doe, seems to have severe malnutrition and is in need of a digestive tube for feeding as he refuses to eat anything. Other than his feeding, he seems to be doing fine. The doctor recommends a procedure to insert a feeding tube. In an older system that doesn't take advantage of big data and predictive machine learning, a coder would have to find the ICD-10 qualifying text, look up the codes for the text, and then manually enter the code in the system using a form-like structure. In the more state-of-the-art systems, big data is used along with artificial intelligence, algorithms to highlight text and provide suggestions to the coder. Then, the coder would only have to select the best code and check that the document did everything correctly and the software was done. In order to accurately present these codes to a coder in an EHR, millions of clinical documents, patient encounters, are gathered and fed into a predictive model. The model is trained using a special algorithm typically based on the latest statistical techniques available. Then the EHR can load the model where new unseen data is presented in a document format. Text is then sent back to the back-end modeling software that, in turn, decodes textual phrases and marks them with several codes per phrase along with a probability. Those codes are then returned to the EHR and consumed by the medical coder and a click to select process that helps speed up the coder's workflow. In the next few slides, I will present an example of how this can be done using specific text labels for word groups such as acronyms. I'm not sure if everyone knows what an acronym is. It's basically an abbreviation for any term and the medical domain acronyms are actually one of the more research problems. One of the common tasks that can improve coding quality in an EHR system is the expansion of an acronym and a clinical report. When there is enough data available, acronyms can be expanded to provide more information to a model that will decode an unseen report. Typical acronyms that most of us have heard are things like BMI, Body Mass Index, EHR, I guess now we've heard that, ER, Emergency Room. However, in the medical domain, there are thousands of acronyms that most professional doctors know. The task of resolving and expanding an acronym such as RA to RUMARE can help decode the written text. The expansion can be done using a statistical model as well. One of the latest machine learning techniques uses what are called word embeddings and a convolutional deep learning model to break down and classify acronyms such as MG to milligrams. I won't bore you with the details because this is the business track of the event. However, it's important to note that a lot of research in the past few years has led to breakthrough discoveries replacing traditional statistical models such as support vector machine and naive base. In this slide, we see how various embeddings are used along with specific layers in a neural network like pooling and softmax to produce three good candidates for the MG acronym from the clinical sentence. This image is not meant to scare anyone. It's only big data and artificial intelligence. Okay. Now that I've let my research side come out, I'm going to go over quickly what word embeddings are and how they are transforming big data projects. I will quickly cover them to help better understand the slide from before. Word embeddings are part of a newer representation of vectors to represent words as input to neural network models. They allow models to interpret data in several ways that are not easy to read by humans due to the number of dimensions. Matrices are created using techniques such as bag of words to produce high probabilities for specific tokens that deserve more importance. Here, I added a simple example using Chris Tucker and Jackie Chan. But one can imagine how clinical text could be represented in acronyms or other words given higher importance in a weighted matrix. We will see in the next few slides how a representation of word embeddings can be used to actually produce what we're looking for in a report. Here are the sample results using machine learning with big data to predict the best acronym expansion for the four most common acronyms in clinical text. Most of us know the ER acronym as emergency room. It generally fails well amongst its competitor external rotation. We can see by the result that by using more data, models generally do better. It is my belief that we haven't reached the tip of the iceberg yet with deep learning and neural networks. This is a small example of one of the techniques that can be used to help enterprise software exceed expectations when solving hard problems like classifying text with codes or predicting patient outcomes. Now that we've seen how big data can be used to classify text in medical reports, let's go over some of the main reasons how CLU can be used to help doctors and patients reach a better outcome. Here are four ways that big data can help an institution that uses an EHR. First, patient outcomes become more predictable. Second, decisions by doctors become easier due to the plethora of information. Third, big data can be used to find unusual patterns that are normally not seen by humans. And fourth, an EHR can help alarm for events that aren't traditionally monitored. Let's dive deeper into these four important points. I don't think that I've ever made a patient that didn't want to know what's going to happen to them next. Patients, especially those in life-threatening positions are often victims of the lack of information. It's not the doctor's fault, though. The doctor may be entirely too busy to spend hours and hours looking at all of the patient information and reports. Some state-of-the-art techniques have been shown to help. For example, at NYU, we found ways to help diagnose cancer ahead of time. Another one has gained fame quickly and the bioinformatics community is incapable of being able to use big data to predict a patient's death. This could be useful for hospitals and doctors, especially when it's a difference of weeks or months. And last but not least, sepsis outcomes. One of the hardest problems that doctors currently face in the medical world is trying to predict when a patient will have sepsis. Most of us believe that the doctor knows best and, for the most part, that's true. However, since doctors are so backed up these days, they need help to make decisions, too. Big data inclusion in an EHR can assist the doctor in several ways. One of those ways is by extracting clinical facts from reports so that doctors get the gist of a patient's lifetime record immediately. At the same time, DHR can pay more attention to past reports and find things that were missing. Lastly, big data can help see if a patient complies in general by reviewing the patient's past visit records and the gaps in treatment. Patients typically display patterns of entry and exit to a hospital. When those visits become unusual, the pattern can be used by predicting what may be bringing the patient to the hospital. Some other patterns that can be found are underlying diseases not typically found or drug abuse and smoking patterns that tend to typically affect a patient greatly. Big data and machine learning scientists, we've spent years developing algorithms to detect these patterns. A hospital can take advantage of all of this work by simply using an EHR system with big data. Many of us have heard those beeping noises in the hospital room from vitals, blood, whatever it may be when you go to the hospital. Those beeps are typically monitoring vital records like blood pressure and oxygen levels. Most of us don't imagine how big data can be used to do a similar task, but using the text only. Laboratory values and their trends can be detected over long periods of time and alarms, maybe not beeps, can be sent via text message or emailed to the nurse on call. Other odd events that aren't typical to a patient's chart can be detected also. For example, blood loss, malnutrition, anemia. Whether it be used for alarming doctors about patients or assisting them with making decisions, there are some high level gains. A hospital can get out of integrating predictive outcomes into their system that most don't realize. And EHR, backed by big data, converts the traditional system to an all knowing assistant to produce faster response times and better decision making, which in turn makes everyone happier. Facilities that use EHR systems backed by big data knowledge can immediately become smarter. The better decision making makes patients happier and sets a higher standard to achieve. Software integration that uses AI is a quick way to improve the quality of healthcare in any facility and should be considered the next best step to take if they haven't already. Oh, here's the doctor smoke coming out of his ears. He's backed up with information from nurses, patients, colleagues, family. Doctors are seeing more and more patients than they used to. Patients expect to be treated with respect and generally want to be part of the healing. However, if the doctor can't provide an optimum experience, everyone suffers. The addition of an EHR with big data can give the doctor the tools that needed to cover more patients in a shorter amount of time. Thus increasing workflow productivity. To extend the alarming idea, it's important in the health domain when an EHR is integrated with a big data backend software engine that immediately becomes useful. It can be configured to warn somehow when the big data API fails and even contact all of the parties involved such as the IT team. Scheduling can also become part of the system. Big data can be used to create more accurate schedules to an EHR when certain trends are found. To sum up the idea of big data in a clinical setting, let's discuss how medical management can win from it. First, the well-being of a patient becomes part of the software's responsibility. That sets higher standards of automation as well and quality from day one when a software system is using big data and it's integrated into an EHR. Institutions become safer as a result of alarms and scheduling based on statistical truths found by big data. Lastly, management becomes quicker by helping the doctors make decisions and increase the knowledge of the patient outcome. As we embark on a new era rooted in big data, everything converts into a health goal that combines the traditional EHR with new services of many types that touch on every part of the medical tree. All branches of medicine can take advantage of a digital future and connections can be made between them to ensure a healthier outcome. We have only begun to see the immediate saving when using an EHR system. A recent study shows that there's a 9% difference in the cost between a patient or cost per patient between a hospital that use an EHR system and a hospital that doesn't use an EHR system. The next step, of course, is to include an EHR system in every EHR system and surely increase the savings involved and allow us to continue using AI across the board. This concludes the presentation on medical report coding. Thanks to the organizers for setting this up and to the audience for listening to my presentation today. Good day. Thank you very much. We have some time for questions. Hello. Thank you for your talk. Do you have any kind of knowledge that could be used for clinical essays or trends that doctors could use? Do you infer it out of the way? No one is training or doing anything to the data if you just provide it right away? Yes. On the software that I've worked on and the papers that I've done for the most part, it's been data from historical records. Also, since in the US it's very protected, you have a HIPAA regulation that doesn't allow you to touch anything without previous knowledge. So, typically it's six months or so until you actually get the data and start working with it and, of course, to build a model, you need more than six months of data. So, it takes time. So, as far as fearing any of the data, I don't think there's any problem there. The biggest fear is that someone actually breaks security and gets a hold of the data. So, if a person's been in an ER or something like that, they don't want others knowing what happened to them in the hospital. That's the biggest fear. You don't use anonymized data? Everyone does. You must use anonymized data. It's part of the HIPAA regulation. So, when you receive this data and you start to process it, the first thing you have to do is anonymize it. That doesn't necessarily mean that it can't be de-anomized, right? The idea is that the algorithms and so forth do, but there's a lot of different ways of getting around that, I think. So, that's the biggest fear right now, other than that, since it's pretty green, there's not a lot of people doing this. Google just began doing it. We've been doing it now for probably about eight years or nine years. So, it's pretty, but other than that, it's pretty green. So, there's not a lot of fear. People want this. People really, as a B2C and a B2C situation, as most of us would love to know when we're going to die, or maybe not. Thank you. Hi. Thanks for the talk. I was wondering if you had any, because I saw the deep learning model, did you have any personal preference for machine learning libraries? What do you personally use, or is a company that decides this? What do you like the most? I've done research on lots of different things. The frameworks when they first came out were more Java and C++, kind of rooted nowadays and this is the big craze. I mean, this is why you have so many people at this conference. Python makes it easy, right? So, I mean, I would just go and use a Python for the general stuff. But depending on how much data you want to process and the amount of time you have, you may need to use something that is not necessarily rooted in a framework. But my personal preference is, you know, I just start off with anything that's TensorFlow based. Thanks. Hi, thanks for the talk. Have you felt doctors backlash that they fear that you're going to somehow change the job? The doctors don't. Generally, the coders do. Yeah, the coders are the ones that have to assign the codes to the reports. And this happens also in machine translation. When you have translators they're getting translation from Google, wherever, all they have to do is modify a few words. It's the same idea in the coding the hospital or medical coding sector. So the coders do, but the doctors are looking for anything that can help them because they're seeing 80 to 100 patients a day. And some hospitals in New York, especially, I know a few have been in there. I've seen them. These doctors are so backed up that anything that they can get to help them, they use. Well, since you mentioned about the fact that the data is very critical, I mean, who gets to know that data and use. I don't know if you have any insight. Now there's a lot of people talking about using blockchain technology to solve this. I don't know if you could provide us with some insight. Yeah, as far as blockchain is concerned, I think that's more related to Bitcoin and things like that on the hospital side. The biggest problem we're currently running into or anyone would currently run into is trying to secure the data on the cloud. So most companies want to get their data and they want to get their apps on the cloud because it's faster processing. It gives them more access to more GPUs and so forth. The problem is is you can't put patient data in the cloud without knowing that it's secure. I think that's the biggest problem we have. Now as far as how Bitcoin, I'm sorry, blockchain could be used to solve that, that's a whole new front. And I don't think there's, at least on the healthcare side, there's not that much information out there right now. In our case we're all doing it. We do everything in-house. We still use on-metal servers for a lot of things. Sorry, just following. My question came because I have seen that there's a call for proposals for European projects and providing public services and one of them was health services. And they mentioned that to use some disruptive technologies and they mentioned blockchain that I understand is not only about Bitcoins and money, but the trust and who gets the, I mean, who gets in and you are going to be sharing the data. You mentioned that the one of the problems is to put that in the cloud not really well, that's a big concern. It could very well be that that's the next disruptive technology. I don't know. That's way out of my domain, I think. Okay, thank you. Any other questions? Okay, thank you very much for the talk. Thank you.