 Live from Boston, Massachusetts, it's The Cube at the HP Vertica Big Data Conference 2014. Brought to you by HP with your hosts, John Furrier and Dave Vellante. Welcome back to Boston everybody. This is Dave Vellante and this is The Cube. We're here, Chowda, Lobster and Big Data. The Cube is SiliconANGLE's flagship program. We go out to the events. We extract the signal from the noise. We've been doing wall-to-wall coverage here at HP's Big Data Conference, Vertica's Big Data Conference. This is our second year here. A lot of customers that we've been interviewing, particularly over the last two years, conservation internationalists here, Jorge, Ahumada, Ayumada and Eric Fegres are here. And we're going to talk about what conservation international does, what they're doing with data. We had a question this morning in the audience when we were up on stage and somebody asked, well, everybody talks about big data driving revenue and getting people to click on ads. What are people doing with big data to affect the world? Change the human condition. So Jorge, thanks very much. Eric, thanks for coming on The Cube. Really appreciate you guys. Thanks Dave. So tell us first about conservation international. That question this morning kind of hit on we used you guys as an example of how you're helping to change the world. So set it up. Tell us about conservation international and what you do there. Yeah, conservation international is a nonprofit organization. We're 25 years old. And we work in trying to marry human well-being with nature. So rather than live at the expense of nature, we want to live with nature and demonstrate how nature is important to people. So basically we work with governments and partners all over the world to try to achieve this mission and make the world more livable. And how are you funded? We're funded through a variety of private donations and government grants and things like that. And talk a little bit about the history of the organization. When did you start and what's your journey been like? Conservation international started about 25 years ago. It came out of another large organization, the Nature Conservancy as a kind of international program for that organization. And then it grew up and now we're working over 25 countries all over the world. And Eric, you're the IT guy, right? Driving all this data. So how long have you been with the firm? So let's see, I've been with Conservation International it's coming up on eight years now. I call it the firm's really organization. Eight years, okay. So you've predated the big data meme, although according to Tom Davenport, it started in 2005. I wasn't talking big data in 2005, but nonetheless you predated the general consensus around big data. So you've seen quite a few changes. Yeah, I think one of the key things that's really exciting about our partnership with HP and the HP Earth Insights is that we've taken technology out of the traditional IT role of supporting laptops, hardware, telecommunications and put technology in the forefront of programs within the organization that are trying to carry out mission level type work, right? So in our case, we work on a long-term biodiversity monitoring network. And so we're able to take a lot of the cutting-edge technologies in terms of sensors and different types of devices, collect information, develop the IT systems that go from the field all over the world, bring them into servers and databases and then get that information out into the public. And then as well as have some analytics on top of that. So, Hori, you talked in your keynote yesterday about the importance, the relationship between nature and the human condition. I wonder if you could address that a little bit. What is that relationship? Well, we take a lot of things for granted. We breathe air, we consume food, we have all these things and we rarely think about where they come from. And a lot of these things come from nature. The food you eat comes from soils that are built upon from thousands and thousands of years of plants decomposing. And the air you breathe comes from forests and plants producing oxygen. And the water is also regulated via forests and natural lands. So all these things we take for granted, but they all come from nature. And we're disconnected a little bit from all those sources. So we're trying to build a new model of sustainable development where rather than develop the expensive nature, we developed in conjunction with nature. And then we have a kind of sustainable solutions that favor both nature, which we need and us as societies. I was struck by your keynote yesterday. I would say, I don't want to say soft-pedal the message, but you weren't in our face about the message, which you could be. Because when you look at spikes in consumption and CO2 levels in population, I mean, they're quite alarming since the dawn of the Industrial Revolution. But you didn't, as I say, jam that down our throat. Rather, you focused on the things that your organization is doing to improve that. So my question is, why the soft-handed approach? You weren't heavy-handed with that. Have you found that that's more effective? Because it seems like many people, particularly in this country, don't care or don't want to listen. Maybe they care, but maybe there's a minority who care. The average individual does take it for granted. So have you found that when you're presenting to audiences and talking to people that it's just, there's no point in trying to beat them over the head with this data? If they read God's last offer and they didn't believe it, then who were you to change their mind? So I was struck by that. Talk about that a little bit. Well, that's very interesting. I mean, I think that there's several approaches you could take on this, and one of this is scaring people, of course. But I think we should rather focus on the solutions rather than trying to scare people. And I think data-driven solutions are a way to go because data is telling you things, and it's not really taking a view of anything. It's just looking at the world in that way and trying to understand why it's happening and how can we solve it. And so I think we should just focus on that and focus on the positive and focus on what can we do moving forward rather than trying to focus on the problems and how to change people's minds. I mean, ultimately it's about bringing people into conservation and making people understand that it's actually a matter of survival. And that's what we're trying to do for our species to be aware of this because otherwise we're going to be gone and we're going to be gone at the expense of nature, right? So when you think about your organization and the data and interpreting the data, what's the data telling you? In our particular program and team, what we're finding is that a lot of these species that are living in tropical forests are actually doing pretty well. So it's good to know. I mean, we don't want to be beating down the message of everything is going down to hell. No, we're happy to see that. There are some problems. Of course, there are some species that are declining and most of those species are not species that we're paying attention to from the traditional conservation point of view. But that's the whole idea for an early warning system that we have, a data-driven early warning system. Now we have the data, we can show what's happening and people can make informed decisions that way. And what are the data sources? The data sources are camatrops. We put camatrops all throughout our 17 sites in tropical forests and these camatrops just, you know, they don't have an agenda. They sit there, an animal passes by, you know, takes a picture and we use that information as a way to assess abundance of these species and then fit trends to this data so we can know whether the species is increasing, decreasing, et cetera. Eric, so we do a lot of video on theCUBE obviously and sometimes it's hard to interpret that data so you have a challenge. So how do you do that? Do you have a metadata taxonomy? Is it just, you know, people watching videos until their eyes bleed to talk about that a little bit? Yeah, well, yes and yes. It's a lot of work. So right now we're still mostly dealing with stills so single, you know, digital photography, digital images. Camatrops are moving towards video something we're going to have to really dig into. We've developed tools. When we started this project, there was no way we could have actually collected the data that we wanted given the current suite of tools that was available for our community. So we developed our own software solution to be able to manage and basically tag and make all that metadata that we need for each image. But we, yeah, I mean, we create a whole suite of metadata around the image. We examine all the EXIF data within the image itself and we store it in the database and then it turns into the data that we use in the analytics. You make it sound so simple. So I'm interested in this innovation you had to develop who couldn't buy anything off the shelf is what I'm hearing. And video, I obviously got that wrong, video would be just too much data and not really worth it at this point. At this point. And probably too expensive to store still and probably too hard to find what you're looking for. So how's it work? So the cameras are triggered by motion? Yeah, Jorge was saying the animals walk by the sensors. There's heat motion sensors on these. A series of images are taken. The animals captured. Hopefully it's not running too fast and we don't miss it. Then what happens is the site managers and the technicians collect the SD cards out of the cameras. They go back to the lab. They use our software to process these data and all the images, excuse me. And then all the data and images get sent to our servers and then it goes into the pipeline and the analytics process that we've developed. So the software that you develop, it auto classifies, auto tags? It's not fully automated. What it does is it facilitates the image annotation. So it groups images based on time and where it was collected. And so it really, you know, they can go through it pretty quickly. And then presumably you want some information in there about the species. That's a human task? Absolutely. We look at that. We look at the image. We want to identify the species and how many individuals are in that species or in that image. And that data gets in there just to clarify by human. So it's site based and it's such a mechanical Turk. Yeah. What kind of volume are we talking about here? Right now we have a couple million images. So it's not huge in terms of what something like Facebook is going to deal with. But where we work, if a site manager has to go through 15,000 or 30,000 images, that's a lot of work. You can imagine scrolling through these even in groups. And then that goes into your database. So then what happens? Who has access to it? How do you use it? How is it organized? Do you ever delete data? We never delete data. It gets curated. It gets turned into the Wildlife Picture Index, analytics. It goes into our analytics system. And yeah, we have various users that can come in and curate the data. Okay. So there's a curation there. And so that's what the primary use is, the curation and then the rendering. Right? Correct. Yeah. So that anybody can come to your website, see the pictures. But you're also analyzing the data. So talk about that a little bit. So yeah. So we developed this Wildlife Picture Index and analytics system. And this is what we did with HP that really shows the trends of the species in the tropical forest over time. And the other key point is that all the data we produce, whether it's primary data or synthetic data, is all publicly available. So we look at ourselves as a global public resource. Okay. All right. So talk a little bit about how the data kind of turns into both insights and action. So you have all this data. So you see some species are doing well. Some species not so well. So what do I do with that information? Yeah. So this is kind of the key question. And a lot of people that work in protected areas throughout the world, and especially in the tropical areas, they don't have access to any information or very little access to information about what's happening at their park. Right? It might collect data, but be a little bit of data here and there, but not enough to tell them the big picture. So this work that we do really goes back to them. And through this Waila picture index system that we developed with HP using Vertica technology, we can now go down and tell them, these are the species that you're part that you should be concerned about. And so they can then use that information to talk to their bosses or their managers and implement ways to find out, okay, maybe we need to patrol more here because this species seems to be declining and maybe this is due to poaching or maybe this species is moving up the mountain because of climate changes. So maybe we need to start thinking about how we're going to protect that area up there. So it really starts a conversation about how to effectively manage wildlife, which is a big problem because people don't have data on this. Well, and I would imagine, too, there's concerns about, okay, if we solve one problem, we may be creating another problem. How do you sort of de-risk that? Is that just a matter of working with experts? Your job is to provide the data and it's their job to sort out the domino effects? I wonder if you could talk about that a little bit. Yeah, I mean, there could be some trade-offs that you have to take when you protect one species, you might not be able to protect all the species. But we are kind of at the beginning stages of this network where just having some data, at least three or four years of data, is gold because it's really, before the other option is zero. It's like the Hubble Telescope. Wow! Exactly. So I don't think they're too worried about trade-offs at this point. I think there's just a lot of these, of our site managers and people that work in these areas are really happy to have data that they can show whether protecting this park is really working or not. And that's good enough. And the data source is primarily or exclusively the camera snaps? So we use camera trap data as well as climate data and we also use a handful of other covariate data to try and look at what could be impacting the trends that we're seeing. So it could be like human presence, deforestation or potentially climate. I was just going to echo on what Jorge was saying. One of the really exciting things is what we're doing is we're kind of shining the light in areas of the world and part of our natural resources that we don't know what's happening. So that's really, we're finding out new things and it's data-driven and it's unbiased and that's exciting. Yeah, but the primary source of the species data are the camera traps. Yes, absolutely. How do you baseline it? What do you start from? Are you doing your own baselining based on frequency of snaps? And how do you ensure statistical validity and all that other good stuff? How do you baseline it? So the first year we start, that's our baseline, right? That's our year of reference. And actually the Wilder Picture Index is anchored at that year and we're measuring changes relative to that first year. How do we ensure statistical validity? Well, we've consulted with over 200 scientists before we developed this protocol. So it's not that we made it up. We actually talked to people that know about this and we designed a protocol that actually has enough power to detect change at the 5% level, for example, annual changes in 5% trends. So we're pretty sure that our protocol and our data are sensitive enough if we want to detect change. And what were the skills sets that you had to bring on board to accomplish this? I presume they didn't just fall out of the sky and into your lap. So you had to think about this problem starting with a blank piece of paper, essentially. And obviously you've got history, a 25-year history. But now you're attacking the problem differently. Did you have to bring in statisticians, new types of experts? I wonder if you could talk about the skilling. Yeah, I mean, as I was saying, you know, when we designed this, we designed it with experts. And, you know, we had to talk to statisticians. We had to talk to people doing camera trapping in the ground that knew about how to do it. We had to consult with many, many people that work in the field to verify whether we could do this in the scale that we were proposing to do. Nobody had done it at that scale. You know, people had gone out. We put 60 camera traps, one camera trap every two square kilometers. Initially, when we said, let's do it that way, everybody said, no, that's impossible. We'll never be able to do that. And then we found out that some people were doing it in Asia and other places. So, you know, it's something that we had to actually say, no, that's the way we're going to go. So they said you couldn't do it. No. People said, oh, you can do it. And you did it. And we did it. Yeah. And so, you know, it was a scale. People were putting camera traps. They used to put, like, a couple of camera traps, you know, within a couple of miles, and that's it, right? But we were talking about, you know, 150 square miles covered by camera traps. It's an entirely new scale that we're looking at. So it just kind of changed the conversation. And then we had that produced a big data set, which we were not prepared to deal with, right? So we were saying, okay, let's do this. And then we started getting all these images, right? Thousands and thousands of images. I was like, wow, what are we going to do with all this? How are we going to process this information? And that's how, you know, we came, we fortunately, you know, kind of came into our partnership with the HP. And that's how we, you know, now we bring this, all this whole other kind of data science and, you know, business and engineering and software developers into the mix to really help us accomplish, you know, kind of what looked like a simple kind of data product in the beginning. Right. So you had to engineer this system to sort of ingest and metadata management and storage system that didn't exist. You mentioned climate data. So you're connecting to public climate data. You got some APIs that you can funnel into. We actually have meteorological stations that we set up. So it's kind of your traditional meteorological stations with data logger, temperature, relative humidity, and solar radiation. So that's your infrastructure. Yeah, because guess what? There's no climate data in tropical forests. I mean, so we had to put our own climate stations with our own, you know, with world accepted standards. And we also kind of fetched some data from satellites and other public available sources. But you really need that ground data to be able to ground it to the truth because, you know, the other data is usually remote data. What are the politics of doing this? Do you have to cajole local governments? Are they receptive to this? Yeah, that's a good question. We had to work individually with each government within each government where we have the site and negotiate with them what are the terms of our involvement. And especially in the beginning, it was fairly difficult to convince people to release the data because we're talking about, you know, 2001, 2002, nobody was talking about publicly available data or very little publicly available data then. So that was a big challenge, but people are now starting to see the benefits because it's creating a whole new way of doing conservation. And I apologize if I missed this, but can you again quantify the scale for us? I mean, in terms of however you look at a number of cameras, amount of data. Yeah, so, you know, we're probably receiving between, you know, 400,000 and 450,000 images per year. We're working in 16 countries in 17 sites throughout the tropics. We have over a thousand camera traps. And, you know, we're probably monitoring, I don't know, millions of football fields every year. And do you know how much data? I mean, off-hand, Derek? So, total data, I mean, we're doing backups. We probably have about nine terabytes or so. Nine terabytes. Ten terabytes, yeah. So, yeah. It's decent. Yeah, it's decent size. It's a lot for us. Yeah, but it's not a ridiculous amount of information. No, it's not. No. But, you know, everything's relative, right? And so, for us, it was challenging. And in particular, some of the modeling that we do persists a lot of simulations and outputs. So, we have to be able to capture and store that. Right. Excellent. All right, we'll get in the hook here. Appreciate you guys coming on and sharing your story. Okay. All right, keep it right there. We'll be back with our next guest. This is The Cube. We're live from Boston. We'll be right back.