 from Berlin, Germany. It's theCUBE, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. Well, hello, welcome to theCUBE. I'm James Gabilis. I'm the lead analyst for Big Data Analytics at in the Wikibon team of SiliconANGLE Media. We are here at DataWorks Summit 2018 in Berlin. Of course, it's hosted by Hortonworks. We are in day one of two days of interviews with executives, with developers, with customers. And this morning, the opening keynote, one of the speakers was a customer of Hortonworks from Munich Rhee, the reinsurance company based, of course, in Munich, Germany. Andreas Kohlmeyer, who's the head of data engineering, I believe. It was an excellent discussion. You've built out a data lake. And the first thing I'd like to ask you, Andreas, is right now it's five weeks until GDPR, the General Data Protection Regulation, goes into full force on May 25th. And of course, it applies to the EU. Anybody who does business in the EU, including companies based elsewhere, such as in the US, needs to start complying with GDPR in terms of protecting personal data. Give us a sense for how Munich Rhee is approaching the deadline, your level of readiness to comply with GDPR, and how your investment in your data lake serves as a foundation for that compliance. Give us a sense. So thanks for the question. GDPR, of course, is the hot topic across all European organizations. And we actually pretty well prepared. We compiled all the processes and the necessary regulations. And in fact, we are now selling this also as a service product to our customers. This has been an interesting side effect because we have lots of other insurance companies. And we started to think about why not offer this as a service to other insurance companies to help them prepare for GDPR. And this is actually proving to be one of the exciting, interesting things that can happen about GDPR. Maybe that will be your new line of business. You'll make more money doing that than... I'm not sure, but it's been an interesting discussion. Well, that's excellent. So you've learned a lot of lessons. Yeah. So already, so you're ready for May 25th? You have, okay, that's great. You're probably far ahead of, I know a lot of US based firms. We're, you know, in our country and in other countries, we're still getting our heads around all the steps that are needed. So, you know, many companies outside the EU may call on you guys for some consulting support. That's great. So give us a sense for your data lake. You discussed it this morning. But can you give us a sense for the business justification for building it out? How you've rolled it out? What stage it's in? Who's using it for what? So absolutely. So one of the key things for us at Munich-Rie is the issue about complexity or data diversity as it was also called this morning. So we have so many different areas where we are doing business in and we have lots of experts in the different areas. And those people now really have, they are very knowledgeable in their area. And now they also get access to new sources of information. So to give you a sense, we have people, for example, that are really familiar with weather and climate change or also with satellites. We have captains for ships or pilots for aircraft. So we have lots of expertise in all the different areas. Why? Because we are taking those risks in our books. And now- Those are big risks, too. You're a reinsurance company, so yeah. And these are actually complex risks where we really have people that really are experts on their field. So we sometimes have people that have 20 years plus of experience in the area and then they change to the reinsurer to actually bring their expertise on the field also to the risk management side. And all those people, they now get an additional source of input, which is the data that is now more or less readily available everywhere. So first of all, we are getting new data with the submissions and the risks that we are taking. And there are also interesting open data sources to connect to, so that those experts can actually bring their knowledge and their analytics to a new level by adding the layer of data and analytics to their existing knowledge. And this allows us, first of all, to understand the risks even better, to put a better price tag on that, and also to take up new risks that have not been possible to cover before. So one of the things is also in the media, I think is that we are also now covering the hyperloop once it's going to be built, so those kind of new things are only possible with data and analytics. So you're a Hortonworks customer, give us a sense for how you're using or deploying Hortonworks data platform or data planning services or whatnot inside of your data lake. It sounds like it's a big data catalog, is that a correct characterization? So one of the things that is key to us is actually finding the right information and connecting those different experts to each other. So this is why the data catalog plays a central role. Here we have selected Elation as a catalog tool also to connect the different experts in the group. The data lake at the moment is an on-prem installation. We are thinking about moving parts of that workload to the cloud to actually save operations costs. Top of Hortonworks, top of HTTP. Yeah, so Elation is actually, as far as I know, technically it's a separate server that indexes the high tables on HTTP. So essentially the catalog itself provides a visualization and correlation of across disparate data sources that are managing your Hadoop. So the catalog actually is a great way of connecting the experts together. So you know, okay, if we have people on one part of the group that are very knowledgeable about weather and they have great data about weather, then we'd like to connect them, for example, to the guys that doing crop insurance for India so that they can use the weather data to improve their models, for example, for crop insurance in Asia. And there the data catalog helps us to connect those experts because you can, first of all, find the data sources and you can also see who is the expert on the data. You can then also call them up or ask them a question in the tool. So it's essentially a great way to share knowledge and to connect the different experts at the group. Okay, so it's also surfacing up human expertise. Okay, is it also serving as a way to find training data sets possibly to use to build machine learning models to do more complex analyses? Is that something that you're doing now or plan to do in the future? Yes, so we are doing some, of course, machine learning or also deep learning projects. We are also started as Center of Excellence for Artificial Intelligence to see, okay, how we can use deep learning and machine learning also to find different ways of pricing insurance risks, for example. And this, of course, for all those cases, data is key and we really need people to get access to the right data. I have to ask you, one of the things I'm seeing, you mentioned Center of Excellence for AI, I'm seeing more companies consider, maybe not do it, consider establishing a office of the chief AI officer, like reporting to the CEO. I'm not sure that that's a great idea for a lot of businesses, but since an insurance company lives and dies by data and calculations and so forth, is that something that Munich Rhee is doing or considering a C-suite level officer of that sort responsible for this AI competency or no? Could be in the future, we sort of just started with the AI Center of Excellence that is now reporting to our chief data officer, so it's not yet a C-suite. Is the Center of Excellence for AI, is it simply like a training institute to provide some basic skill building or is there something more there, do you do development? Actually, they are trying out and developing ways on how we can use AI on deep learning for insurance. One of the core things, of course, is also about understanding natural language to structure the information that we are getting in PDFs and in documents, but really also about using deep learning as a new way to build tariffs for the insurance industry, so that's one of the core things to find and create new tariffs, and we're also experimenting, haven't found the product yet there, whether or not we can use deep learning to create better tariffs. That could also then be one of the services again we are providing to our customers, the insurance companies, and they built that into their product, something like, yeah, the algorithms is powered by Munich Rhee. Now your users of your data lake, these are expert quantitative analysts, right, for the most part, so you mentioned using natural language understanding AI capabilities, is that something that you have a need to do in high volume as a reinsurance company, take lots of source documents and be able to, as it were, identify the content at high volume and import it, you know, not OCR, but rather the actual build a graph, a semantic graph of what's going on inside the document. I'm going to give you an example of the things that we are doing with natural language processing, and this one is about the energy business in the US, so we are actually taking up or seeing most of the risks that are related to oil and gas in the US, so all the refineries or the larger stations and the petroleum tanks, they are all in our books, and for each and every one of them, we get a nice report on risks there with a couple of hundred of pages, and inside these reports, there's also some paragraph written in where actually the refinery or the plants gets its supplies from and where it ships its products to, and since we are seeing all those documents, that's in the scale of a couple of thousands, so it's not really huge, but altogether a couple of hundred thousand pages, we use NLP and AI on those documents to extract the supply chain information out of it, so in that way we can stitch together a more or less complete picture of the supply chain for oil and gas in the US, which helps us again to better understand that risk because supply chain breakdown is one of the major risks in the world nowadays. Andres, this has been great. We can keep on going on. I'm totally fascinated by your use of AI, but also your use of a data lake, and I'm impressed by your ability to get your, as a company, get your, as we say in the US, get your GDPR ducks in a row, and that's great, so it's been great to have you on theCUBE. We are here at DataWorks Summit in Berlin.