 Live from Boston, Massachusetts. Extracting the signal from the noise. It's theCUBE, covering HP Big Data Conference 2015. Brought to you by HP Software. Now, your host, Dave Vellante. Welcome back to the C4 in Boston, Massachusetts everybody. This is theCUBE, I'm Dave Vellante. And this is HP's Big Data Conference 2015. This is the third year that HP has run this event that spawned out of the initial Vertica users group. Really excited to have folks from Conservation International back. Jorge Omada is the executive director of the organization. And Eric Fegris is the senior director of information systems. Gentlemen, welcome back to theCUBE. It's good to see you again. Good to see you, David. So Conservation International, we heard the story last year, but for those in our audience who aren't familiar, Jorge maybe set it up and tell us a little bit about the organization. Yeah, so Conservation International is a nonprofit organization. We work towards trying to find sustainable solutions that are both good for people and for nature. And I work in a program called the team network and we are all about gathering data about the environment and particularly gathering data about biodiversity, wildlife and other important features in tropical forests. So we developed a couple of years ago a really good partnership with Hilo Packard where we had a big data challenge and they really helped us develop some analytics and other tools to help us analyze the data and bring it back to our partners and customers in the field so that they could make use of this data basically analyzing millions of camera trap images and getting useful information from those images in terms of indices that tell you how effective are you being in conserving biodiversity? Yeah, and that's one of the unique things about what you guys do. I mean, normally you think of big data, you think, well, we got a data warehouse and they have a structured data and you can combine it with unstructured data and social data. I'm going to bring in other data from other APIs and weather data and you may do a lot of that as well but the really unique thing is you're actually taking pictures, right? You've got film of the environment and you're trying to understand what's happening in the physical location, right? That's right, and it's not only that. I mean, as you mentioned, big data is all about unstructured and different kinds of data but as Michael Stonebreaker was mentioning yesterday, it's also about analytics and we are at the forefront of big data. We use the images as educational materials but they're data for us and we were applying fairly complex Bayesian models, the kinds of things he was talking about yesterday where the industry still has to leverage the products that we have now to deliver better analytics. So we're right at the edge of that and really doing this in environment. And Eric, we talked last year about some of the technical challenges that that brings and maybe we can sort of summarize that and get into what's new but I mean, essentially you've got a big metadata challenge. Got all these images, I get this video. What is it that I'm looking at? So maybe you could describe how you approach this problem. Yeah, it's definitely a big data challenge in the fact that we're getting images from all over the world. We have to physically get them to a central server that we have. Some places where we work, the internet connectivity is very, very low if at all. So sometimes they have to go to a big city to be able to upload the data to us. But then once we get the data, then it becomes kind of part of a normal kind of data transformation process where we extract information from the image, we identify the species and then it gets turned into this larger analytical pipeline that Jorge was just talking about. Using Bayesian statistics, we use R and then we present that in very clear, meaningful ways that our protected area managers understand what's happening with species populations in the protected areas that they're in charge of managing. So it's really insightful information that they can use, they can understand, they can also look to see what's potentially driving the trends of the species that they're seeing. So it's a really fantastic tool. And at the start of the project you must have and must continue to face sort of a data quality issue where you, again, they have to have that base metadata. How does that actually get populated? Is it, do you have a means of automating that or is it sort of the mechanical church? Well, that's interesting. So right now what we've done is in our software is we've developed ways to rapidly identify a lot of the images and the species and images. We're going down the path looking at image automation and image recognition. And it's probably going to end up being kind of a hybrid approach of using humans as well as machine learning techniques to kind of rapidly go through lots of images at a time to get the species information. Yeah, so we all know if you're on Facebook you see yourself get tagged instantly and you think, wow, facial recognition is really, it certainly has come a long way but you think it's there but it's not that simple, is it? Actually human face is easy. There's a lot of key areas you can look at but when you look at animals and their body shapes and it gets a lot more challenging. Particularly you have black and white, you have color images, you have lots of similarities in the species structures so it's actually a little bit more difficult problem than human faces. So Jorge, what's changed since we last talked? What's new with you guys? Well it's been a really exciting year for us. Now that we have the analytics system in place we're exploring other uses of the system. For example this conference we presented a case study on distributed R. We've been using distributed R to speed up the processing of image data, not animal image data but satellite image data that we use to try to quantify deforestation at these places. And so we had like a basically a tenfold increase in processing of satellite images when you compare traditional R with distributed R which is running on vertical and that's a fabulous increase in speed and in performance. I mean that's one of the kind of the breakthroughs. We've continued to work with the, we're analytics system, the wildlife picture analytics system to start engaging governments now and trying to bring this up to scale. So it's not just our network of 20 sites or so, we're trying to get into national level networks so that countries can actually use these tools to inform their policy at that level. So of course R is the sort of, let's call it the modern tool of statistical analysis. Old guys like myself are trained in SPSS, many of you I'm sure are familiar with SAS. So R is sort of the open source version of those but distributed R is relatively new. How did it come about, what is distributed R? Distributed R is a package that was written by folks at HP Vertica and they took many of the functions that run natively on R and made them run in parallel, to take full advantage of parallel processing and memory management at Vertica. So with that we can run, our normal functions in R but we can take advantage of the power of Vertica essentially. And not just in managing the data but also in the analytics which was kind of a limiting factor for us. So in this world of scale out it's sort of a better fit for the momentum of the market. And HP put that code back into the open source community. That's right. So distributed R is a completely open source package. And now Eric, you gave a talk at this event on Big Data for Good. I was commenting that last year when we were up on stage, myself and some colleagues, one of the questions that I got from the audiences and it was sort of a, there was a little tension. Yeah, we talk about big data and people putting ads in front of our face and trying to get us the swipe or credit card. When are we going to see big data applied for good? That was the essence of your talk. It was some color. It is. And that's exactly what we've been working on with this solution is using these, this kind of big data tools and software to be able to inform conservation decisions. And the nice thing, what Jorge was saying earlier is we've built a system that satisfies the needs of our network. And not only have we done that, we've made a system that's replicable and scalable. So as we think about new ways to use the tools with new organizations, governments, nonprofits, we can scale the system and it can work with essentially more customers but new organizations, new partners, et cetera. So it's really like this nonprofit entrepreneurial software product all being merged together and it's an exciting, exciting thing for us. And how, so how did this all come about conservation international? You know, where's the funding come from? How do you guys sustain, you know, your mission? Well, Conservation International is about a little over 25 years old. It's an organization that was born out of another nonprofit organization that was doing some basically domestic conservation work in the United States and it was spun out of the international program of that organization. And we work with governments with, you know, big agents of change, you know, foundations and banks and multilaterals to try to affect change at a large scale. Funding comes mostly from many of these organizations as well as, you know, private foundations and donors. I mean, we're not, we don't have a huge donor base but we have a very influential board and, you know, people that are really, really committed to changing the environment for the good and do it in a sustainable way that is both good for people and for nature. I was struck by Ken Rudin's comments today and we saw the keynotes and he was emphasizing that it's not just about actionable insights, it's actually about taking action and affecting change. So I'm curious, I'm presuming you agree with that. We have, for instance, a lot of data on climate change. We haven't really acted as much as many would like. So talk about the changes that you've been able to affect, which is ultimately the mission of big data analytics. Yeah, I mean, this is a great point and, you know, we can have all the analytics we want if we don't take action, it's not going to do anything and we're starting to see that in the team network now. Now that we have this tool, people are starting to look at the data. For example, in Uganda, we've seen that, you know, the wildlife managers in our partner site there have detected that a couple of cat species are declining and they actually, this spun a series of research around that, why is this happening? Is this happening because of food shortages in the reserve? Is this happening because of people? And there was a lot of actually really good insights coming out of that and some of the actions that were taken were maybe try to zone out some of the tourist access to the park because this seems to have an effect on where cats are and, you know, they would never had done this if they'd never seen the analytics. They, you know, they would have been still in the blind and they'd even know the cat was declining. So this has really actually taken park managers into a realm they've never been before because now they have the tools, they have the data, they have the analytics to be able to take action. So presumably you're able to directly affect public policy. I mean, that's kind of what your goal is, right? Because without the data, you know, presumably those that are against protecting the environment can make a, you know, stronger case or at least a more muddled case but have you seen a direct relationship between the data and the ability of those who are making decisions because you're not dictating public policy, you're just providing the data. Have you seen a direct correlation between access to that data and the ability to affect public policy? Well, the interesting thing about public policy is that many of these countries where we work already have commitments to conserve biodiversity. So it is a policy that they already have in place but they don't have the tools or the infrastructure, as Eric was saying, to basically test whether their policies are working or not. So we're, they already, we don't need to convince them that this is important. They already know it's important and now we're just enabling them to monitor the biodiversity and be able to, you know, get to the commitments that they have pledged to, you know, in front of international governments. A key kind of gap that wasn't filled is they didn't have the tools to be able to see what was actually happening in the protected areas and come up with quantifiable indicators. And so now with the software systems and our methodologies, we have a package that will enable these government agencies to do that. Well, presumably that creates some kind of flywheel effect or at least, you know, circular effect where you're providing better data that's able to demonstrate the impact, the positive impact of public policy that hopefully will get you more funding to get more data. So, from a technology standpoint, what are the sort of, to the limitations that you face, Eric, that you'd like to, you know, barriers that you'd like to break through? Is it funding? Is it other technological barriers? Is it people in process? That's a good question. You know, technology was a huge hurdle, but now that, you know, with our partnership with HP, we've kind of, we've solved that. And we now have scalable software that, you know, it can run in the cloud, we can spin up as many instances as we need depending on our customer's needs. Now it's really, you know, it's just a question of really rolling out and implementing it. And so that's the phase we're in. And that, you know, comes with some funding challenges and, but yeah, I'd say, you know, we're poised to really go for it. So you mentioned the cloud. I mean, you got a lot of distributed locations, so you were a lot in the cloud. But what is your cloud strategy? Well, it's going to be interesting. We're talking about that yesterday. You know, a lot of the countries where we work, they may have different requirements about where they can store their data. And if we can work with some of the big cloud providers, like HP and some of the other big ones here, and the solution will work there, that's great. But we may have to get creative if they're being really restrictive and they want to use a country that's, you know, only available in that, or sorry, a cloud provider that's only available in that country. So there may be some other work that we have to do there. We do have on-premise solutions as well that will work. So depending on where we are, it's going to be, we may have to tweak our approach a little bit. And how does that work? I mean, are you a provider of those solutions in a manner where you just go to technology partners like HP? So we're providing the specific solution that the countries need or the organizations need to manage and manage their camera trap data and the environmental data. We'll partner with organizations like HP, potentially other IT contractors to actually implement the kind of mixture of we have the hardware and the backend working as needed. So the local countries are essentially, it's a self-funding model now. Is that right, or is it not necessarily? It depends, it may be a hybrid approach. Again, when Jorge was talking about a lot of the donors that we work with, and so it'll be a case-by-case basis depending on every country. So other stories, other things that you want to highlight, personal stories, things that you want to share with our audience. Well, we're putting together a new effort called the Wildlife Insights. I want Eric to talk a little bit more about it, but essentially it's a place where we are actually calling for the community of camera trappers in the world to start providing their data and aggregating the camera trap data. And it's going to be called Wildlife Insights. And Eric, if you want to get some more details. So it's interesting, we kind of have two approaches. We have our kind of enterprise solution that's meant for people, organizations that really want to do standardized monitoring. They want to look to see exactly what's happening with species populations in their protected areas. And then we're having this kind of open cloud-source solution where people doing camera trapping for a whole variety of different reasons. Traditionally, they've been collecting this data and sitting on their laptops in hard drives, isolated the data die as they move on in their careers. And so this new approach called Wildlife Insights will be this cloud-based repository where anyone can share their data and we're going to have analytics involved in it as well. These two systems that we're talking about will be able to communicate, but we kind of have two approaches and we're kind of trying to provide solutions for the entire community as well. Do you guys, what's your sort of social strategy, if you will? I mean, you've got a growing and vibrant community. How active are they? I mean, do you have deliberate social media plan strategies? We have a Facebook page and we have Twitter accounts and Instagram. I mean, we use a lot of conservation and internationals and other partners of team, social strategies. I think we're not, because we're working with governments and local partners, it has to take on first there. And then I think it's not a matter of just disseminating only globally and to individuals. So it's also working with key partners in our case governments that have the... How many people in the organization? About 1,000 people. Yeah, substantial. And what are your goals for the next 12, 18 months? What should we be watching? Well, we hope to roll out the first few analytics systems in three countries and maybe some other places. So I think it's going to be an exciting year. You know, we're in a transition mode where we're taking our network and leveraging all the resources that we have in our network and trying to spun out these national-level networks. In new countries or... No, in countries where we're already working, but we don't have that presence yet. Much larger scale. Much larger scale to cover more of the country right now. Bigger footprint. Yeah, and more influence, yeah. And you've got more proof points now. Yeah, yeah. So it's a really exciting year and hopefully we'll be here next year telling you a lot about these tools. Well, hopefully we'll be here interviewing you next year as well. So, Eric and Jorge, thanks very much for coming back in theCUBE. It was really great to see you again. It's great to see David. All right, keep it right there. Everybody will be back with our next guest. We're live. This is theCUBE from HP Big Data 2015 in Boston. We'll be right back.