 from Berlin, Germany. It's theCUBE, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. Well, hello, welcome to theCUBE. I'm James Kabilis. I'm the lead analyst for the Wikibon team within Silicon Angle Media. I'm your host today here at DataWorks Summit in 2018 in Berlin, Germany. We have one of Hortonworks' customers in South America with us. This is Fernando Lopez of Kwanam. They're based, he's based in Montevideo, Uruguay. And he has won here at the conference. He and his company have won an award, a data science award. So what I'd like to do is ask Fernando, Fernando Lopez, to introduce himself, to give us his job description, to describe the project for which you won the award and take it from there. Fernando! Hello, and thanks for the chance. You're great to have you. I work for Kwanam, as you already explained. We are about 400 people in the whole company. And we are spread across Latin America. I come from the kind of headquarters, which is located in Montevideo, Uruguay. And there we have a business analytics business unit within that, we are about 70 people. And we have a big data and artificial intelligence and cognitive computing group, which I lead. And yes, we also implement Hortonworks. We are actually partnering with Hortonworks. When you say you lead the group, are you a data scientist yourself or do you manage a group of data scientists for a bit of both? Well, a bit of both. You know, you have to do different stuff in this life. So yes, I lead the implementation groups. Sometimes the project is more big data. Sometimes it's more data science, different flavors. But within this group, we try to cover different aspects that are related in some sense with the big data. It could be artificial intelligence. It could be cognitive computing, you know. Yes. So describe how you're using Hortonworks and describe the project for which you want, I assume it's one project for which you won the award I hear at this conference. All right, yes, we are running several projects, but this one, the one about the price is one that I like you, I like so much because I'm actually a bioinformatics student so I have a special interest in this one. Okay. It's good to clarify that this was a joint effort between quantum and gene lives. Gene labs. Gene lives. Gene lives. Yes. It's a genetics and bioinformatics company that they specialize. Is that a multi video based company? Yes. In a line, they are a startup that was born from the Institute Pasteur, but in multi video. And they have a lot of people who are specialists in bioinformatics, genetics, with a long career in this, in the subject. And we come from the other side, from big data. I was kind of in the middle because of my interests with bioinformatics. So, something like one year and a half ago, we met both companies. Actually, there is a research, an innovation center, ICT4V, you can visit ICT4V.org, which is a non-profit organization after an agreement between UWI and France, both governments. Okay. That makes possible different private or public organizations to collaborate. We have brainstorming sessions and so on. And from one of that brainstorming sessions, this project was born. So, after that, we started to discuss ideas of how to bring tools to the medical geneticist in order to streamline his work, in order to put on the top of his desktop different tools that could make his work easier and more productive. Looking for genetic diseases, or what are they looking for in the data specifically? Correct. I'm not a geneticist, but I try to explain myself as good as I can. Okay, that's good. If I am the doctor, then I will spend a lot of hours researching literature, be in mind that we have nearly 300 papers each day coming up in PubMed that could be related with genetics. That's a lot. These are papers in Spanish that are published in South America? No, just talking about PubMed from the NIH, these papers published in English. Okay. PubMed or Medline. Different languages, different countries, different sources. Yeah, but most of it is, or everything in PubMed is in English. There is another PubMed in Europe, and we have Cielo in Latin America also. But just to give you an idea, there's only from that source, 300 papers each day that could be related to genetics. So, only speaking about literature, there's a huge amount of information. If I am the doctor, it's difficult to process that. Okay, so that's part of the issue. But on the core of the solution, what we want to give is, starting from the sequence genome of one patient, what can we assert? What can we say about the different variations? It is believed that we have around each one of us, has about four million mutations. Mutation doesn't mean disease. Mutation actually leads to variation. And variation is not necessarily something negative. We can have different color of the eyes. We can have more or less hair. Or this could represent some disease, something that we need to pay attention as doctors. Okay, so this part of the solution tries to implement heuristics on what's coming from the sequencing process. And these heuristics, in short, they tell you which is the score of each variant, of the variation, of being more or less pathogenic. So, if I am the doctor, part of the work is done there. Then I have to decide, okay, my diagnosis is, is there is this disease or not? This can be used in two senses. It can be used as prevention in order to predict, hey, this could happen. You have this genetic risk. Or this could be used in order to explain some disease and find the treatment. So, that's the more bioinformatics part. Yes. On the other hand, we have the literature. What do we do with the literature is, we ingest these 300 daily papers. Yes. Well, abstract, not papers. Actually, we have about 3 million abstracts. You just text and graphics, all of it? No, only the abstract, which is about a few hundred words. So just text. Yes. Okay. But from there, we try to identify relevant identities, proteins, diseases, phenotypes, things like that. And then we try to infer valid relationships. This phenotype or this disease can be caused because of this protein or because of the expression of that gene, which is another entity. So this builds up kind of ontology. We call it the mini ontology because it's specific to this domain. So we have a kind of mini semantic network with millions of nodes and edges, which is quite easy to interrogate. But the point is, there you have more than just text. You have something that is already enriched. You have a series of nodes and arrows, and you can query that in terms of reasoning. What leads to what? So the analytical tools you're using, they come from, well, Hortonworks doesn't make those tools. Are they coming from another partner in South America or another partner of Hortonworks, like an IBM, or where does that come from? That's a nice question. Actually, we have an architecture. The core of the architecture is Hortonworks because we have scalability topics. We have HTTP. Yes, HDFS, Hive on Test, Spark. We have a number of items that need to be easily ultra-escalated because when we talk about genome, it's easy to think about one terabyte per patient of work. So that's one thing regarding storage and compute. On the other hand, we use a graph database. We use Neo4j for that. Okay, Neo4j for graph. Neo4j, you have Hortonworks. Yes, and we also use, in order to do the natural language processing, we use NIME, which is based here in Berlin, actually. So we do part of the machine learning with NIME. Then we have Neo4j for the graph for building this semantic network. And for the whole processing, we have Hortonworks for running this analysis and heuristics and scoring the variants. We also use SOLAR for enterprise search on top of the documents that come from the, or the conclusions of the documents that come from the ontology. Wow, that's a very complex and intricate deployment. So great, so in terms of the takeaways from this event, we only have just a little bit more time. What of all the discussions and the breakouts and the keynote, did you find most interesting so far about this show? Data stewardship was a theme of Scott Nows with that new solution. In terms of what you're describing as an operational application, have you built out something that can be deployed, is being deployed by your customers on an ongoing, it wasn't a one-time project, right? This is an ongoing application that they can use internally. Is there a need in Uruguay or among your customers to provide privacy protections on this data? Will you be using these solutions like the Data Steward Studio to enable the degree of privacy protection on data equivalent to what, say, GDPR requires in Europe? Is that something? Yes, actually, we are running other projects in Uruguay. We are helping the, with other companies, we are helping the National Telecommunications Company. So there is, there are security and privacy topics over there and we are also starting, these days, a new project. Again, with ICT4V, another friend companies, we are in charge of the big data part for an education program, which is based on the One Laptop per Child initiative from the times of Nicholas Negroponte. Well, that initiative has already 10 years in... Oh yeah, from MIT, yes. Yes, from MIT, right. That initiative has already 10 years old in Uruguay and now it has evolved also to retired people. So it's a kind of going towards the digital society. Excellent, I have to wrap it up. For now, that's great, you have a lot of follow-on work. This is great. So clearly, a lot of very advanced research is being done all over the world. I had the previous guests from South Africa. You from Uruguay, so really, south of the equator, there's far more activity in big data than we here in the Northern Hemisphere of Europe and North America realize. So I'm very impressed and I look forward to hearing more from Kwanam and through your partner, or your provider, Hortonworks. Well, thank you very much. Thank you and thanks for the chance. It's been great to have you here on theCUBE. Thank you. We're here at DataWorks Summit in Berlin and we'll be talking to another guest fairly soon.