 Live from New York, extracting the signal from the noise. It's theCUBE, covering RapidMiner Wisdom 2016, brought to you by RapidMiner. Now, your hosts, Dave Vellante and Jeff Brick. Welcome back to New York City, everybody. We're here at the RapidMiner Wisdom Conference. This is theCUBE. We go out to the events. We extract the signal from the noise. Elion Karsonad is here. He's the founder of Nam Store. Elion, welcome to theCUBE. Welcome to America. Well, thank you, Dave. Hi, Jeff. Welcome. Pleasure to be here. Pleasure seeing you. So, tell us Nam Store, Nam Store, sorry. What's the company do? What are you guys all about? Linguistics, all kinds of interesting things you're doing. Tell us. Yeah, simple answer, Nam Store, salt names. We look at personal names, and we try to infer any information like, for example, gender or likely origin. In a city like New York, that's so diverse, you can use this information, for example, to revise the patterns of segregation in the city, to analyze what's the gender gap, for example, in data science. How many men and women are here today at RapidMiner? All this information you can typically infer from the names. So, you take a corpus of data into your system, and then it presents you with all kinds of information outputs about that data. Is that right, or? Exactly, Dave. And in a very simply, simple form, you can import data that has a name, but doesn't have, for example, any information about gender or origin, and you can add those variables. So, the way we've integrated, for example, in RapidMiner, was to add an extension that adds operators like passing names, extracting gender and origin, and that will take a data set as input and add columns with those new variables, which then you can correlate to other elements or input in a decision tree, or then, of course, project with other variables, like geography, like job type, lots of different things. So, let's get to it. How are people using your product and take us through an example or two? Well, the simplest example is we've partnered with a lady, Elena Rossini, to launch a website called GenderGapGrader, where we analyze the gender gap in wide industries, like, for example, the film industry, or airline pilots, or VCN business angels, and we produce research on what's the gender gap compared now, for example, compared to 10 years ago. And this is a very practical example of how data mining can also help resolve some of the social issues. But also, if you take the example of marketing or sentiment analysis, the same technology can be used, for example, to analyze what is the sentiment on Twitter or LinkedIn, social networks that have a lot of information, a lot of content, but do not contain any information about gender. So, you have to infer it some way, for example, from the name. And many large US companies in the digital world in the digital sector don't actually have gender information on their client because to optimize the acquisition of new customers and new leads, they minimize the amount of information that their user have to input when they register. So usually they would just register with the first name and the last name, perhaps an email address, and initially that's what they have. As for the topic of origin, our additional social demographics about the origin of names, we've done a few projects with, for example, in the field of economic development, helping countries reconnect with their diaspora, for example, to attract foreign direct investments to their country. Actually, the best example of countries doing that is Ireland. They've attracted billions of dollars through their diaspora using connection, family links, already in the 80s, to attract big companies to Ireland, and even today, of course, there is also tax incentives for US companies in the digital sector to establish their headquarters in Ireland, but also, IDR Ireland is very proactive at connecting with people of Irish heritage who have a strong position in a US company. So some other countries have started to do that. We've actually worked, for example, with Investly Thuénia in the field of foreign direct investments, and also with my country, my own country, France. And not only it's important for countries to reconnect with their diaspora for investment, but more and more so for talents, innovation, education. Of course, for countries that are in developing mode, this is even more important. If we think of countries like some African countries, they have sometimes even health problems because they don't have enough doctors, they went away. So the more the world is open, the more people move around in the planet, the more at the same time it creates opportunities to reconnect these communities, advanced scientists, innovators with their home country so they can give back in a way through talents, through investment and so on. So this is another example. So take us through, so how does the product work? And take us through how it goes from sort of name sorting into what you just described as attracting people to Ireland or that's a, it feels like a big leap. Yeah, there is a leap. But what happens is usually those countries, for example, lost track of their diaspora. So big organizations like the IOM, you know the International Organization of Migration or even organizations like US-8 or the French AFD and so on, would typically launch diaspora mapping exercise to try to find who are those people, where are they and how they can help their home country, for example, overcome a big crisis. So what we can do, for example, is connect, let's say through RapidMiner, for example, to a very large database of senior executives in Western Europe or the US, for example, and filter names that are very likely originally from those countries. And so we just accelerate this diaspora mapping exercise. And of course then the investment promotion agency can reconnect with those people in a very proactive way. For example, inviting them to an event to promote investment in the home country. Another example, and this is how home countries can benefit from their diaspora through investment or skills. But more recently we've participated to a conference in Canada. It was called the Canada Science Policy Conference and Canada is really interested conversely to attract more talents to Canada in science, but also to create more international collaboration with universities in emerging countries. So in that case it's really win-win because people originally from those countries of course they can contribute. They don't really necessarily have to go back to their home country to contribute. They can have a position in the university. They can participate to conferences. They can remotely tutor students. So there's a wide range of collaboration that can happen. Talk about the product. So you have a platform, obviously. You've got APIs I'm sure. Exactly, the product really is an API that take as an input a name and gives as an output gender or origin. And all that is integrated nicely within RapidMiner so that you have these operators that can be part of a more integrated business process. Like from data acquisition, data enrichment, maybe also predictive analytics and so on. We've had a few projects in the US. Actually one of our clients is here. We presented today also one pilot project with Boston City to analyze the geodemographics of the city and to prepare a very interesting map of the different regions of the city. And for today's presentation, we've done that also for New York. So we've looked at a database called ACRIS or ACRIS, which is a real estate database who owns New York in the different districts. And we've looked at that from the data mining perspective. We are of course extremely excited to work with an organization that already use RapidMiner for typically customer segmentation, predictive analytics, and especially in sectors like banking, typically remittances where money flows cross-border between the very mature countries and the emerging countries. Also the travel industry and also geodemographics, which is how to understand territories and segregation patterns. So I was gonna say, I'm curious. So did you have a solution that then you had to go find problems, which you found a lot that you can apply it to? Or did you come at it really from the problem set that they needed a better way to get this demographic information? And then kind of build the ontology to figure that out. Because it sounds like the variety of applications is huge for what is a relatively specialized application. Yes, exactly. So it's a bit of both. So we started to build this product in three years ago. We didn't have a clear business plan or understanding, let's say of what could be the problem this technology would solve. But we were sure we would bring a new angle of analysis that would be interesting. And then it's a bit by chance that we started to understand how that could be useful. So for example, for the gender gap, it's really meeting Elena at OECD conference that triggered this application of measuring the gender gap and promoting gender equality in organizations. And we did an initial project, which at the time was to look at the gender gap in the film industry. And that was during the Cannes Festival. So there are little infographics we did when kind of viral and that triggered interest. As for the topic of origin, this is a topic that is very sensitive in France. So we were always extremely careful about what type of projects we could do. And initially we decided to work with governments and universities. And gradually they came with ideas of how that could be useful. As for, for example, the city of Boston, they found us through some of the blog posts we had made about Irish diaspora and so on. And we are very excited about doing more interesting projects with them. Because of course, those who really know what the use cases are are in the end, the cities, the companies, the organizations that have a problem to solve. When I imagine if you had the time element too, where the data is probably even easier to analyze, take Boston, not only the moment today, but roll it back every 10 years for the last 200. A whole nother way to see changes, migrations. But I'm curious to know, how far do you think you can go? How many columns of data do you think you can extract value for beyond gender and origin? Well, we've started really focusing on personal names and of course, there is a limit to how much information we can extract from that. But the first thing is that varies a lot by country. And just to give an example, India is a country where you have maybe six or seven scripts to write the names because it's different states, different languages and out of 1.5 billion people you can already allocate the names to about 30 regions and different social groups. So there is a wealth of information there that is totally different to what you could extract from, for example, names in Mexico. Same thing with regions like Africa where countries are absolutely arbitrary in terms of geography. So the human population, the culture, the language is something that goes beyond the borders. And there is a lot of work that we can still do with anthropologies and so on to understand all that. And we do plan also to expand what we've started to, what we've done on personal names to other types of proper names like company names, branding and so on. We're out of time, but last question, the company, self-funded, you have a venture backed, bootstrapped, give us the update and where are you headed? So we're self-funded, we're profitable and we work a lot through a network, network in terms of sale and network in terms of collaboration in the academia sector to improve the software for different regions. So we're a small private company and we are quite excited to have this opportunity to be part of a larger platform like RapidMiner to expand worldwide in a wide range of markets. Yeah, well it's a global community with some really smart data scientists and that's great. Congratulations on your progress and thanks for coming in and sharing your story with theCUBE. Thank you Dave, thank you Jeff. Great to meet you. All right, keep right there everybody. This is theCUBE, we'll be right back. RapidMiner Wisdom 16, New York City, right back.