 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Welcome back everybody, this is theCUBE. We're live here at Big Data SV 2015 in San Jose, California. I'm Jeff Kelly with Wikibon. I'm joined by my co-host, Jeff Frick. In this segment we're joined by Kyoto Tamura who is the Director of Marketing at Treasure Data. Welcome. Thanks for having me. Well thanks for coming on theCUBE, first time guest. So tell us a little bit about Treasure Data. We've spoken with you guys before but I think it would be valuable for audience for you to kind of give a good overview. What's Treasure Data all about? Sure, so Treasure Data was started with a vision that we want to make data accessible for everyone by building a truly end-to-end cloud-based analytics infrastructure for all kinds of data. We started a company in December 2011 and we just raised Series B last month and I've been with a company since May of 2012. So it's been quite a journey for me. And you've worn, we were talking off camera, you've worn quite a few different hats inside of Treasure Data. I've done everything from customer support, sales engineering and my pitch to people is that you can ask me anything about a company other than fundraising and R&D. Because if I wrote the code, we wouldn't be around. We wouldn't have raised Series B. Okay, got it. So tell us a little bit more about kind of the platform mentions cloud-based which is bringing together two of the biggest things we talk about on theCUBE all the time, cloud and big data. Walk through some of the use cases, kind of what you really designed, what the platforms really designed to handle. So we consider like there are three pain points when you try to kick off big data projects or data analytics initiative. One is collecting data from various sources. A lot of times people talk about big data. They talk about volume. Volume is a very important concern. I'm gonna touch on that. But also like the variety of data you need to collect, right? If you're an e-commerce company, there's mobile apps, there's an in-store POS data and there's a big web servers with multiple properties. A lot of times the pain point is that those are in different data silos and data analysts and data scientists, they can't access it in a way that is productive. So we solve this problem with our data collection agent as well as like mobile SDKs and like web SDKs. And this ensures that all kinds of data are coming through treasure data in a reliable and streaming manner. And for the storage layer, we built our own highly fault torrent, highly compressed columnar storage. Internally we dub it plasma. And we run that on the cloud today and that's what stores our customer's data very, very efficiently. And finally like a lot of times people buy all sorts of big data storage systems and that does exactly just that. Like store the data. And one number that I heard is only like 15, 20% of that data is effectively utilized. We solved that problem by providing uniform SQL interface. So if you're like a skill type data analyst or data scientist who want to pre-process data before they actually get to the meat of it, there's that interface. We also understand that a lot of BI people want connectivity to the tools that they're familiar with. So we provide the ability to integrate with existing BI tools as well. So things like Tableau and other tools. So tell us a little bit about what your customers are doing with your platform. Because I think one of the themes we've talked about all week, Jeff, is moving from talking about the technology and kind of the framework to, hey, what are we actually gonna do with all this data? So let me back up. Is that the theme that you're hearing as well this week at the show? And talk a little bit about how your customers are actually executing on that. Sure. One big customer that we have who is now big, but who started out very small with us, is this e-commerce platform called Wish.com. They're the world's largest mobile shopping mall. And what's interesting with Wish is that their co-founder, Danny Zahn, is a Hadoop veteran. He worked at Yahoo where he built a big Hadoop cluster. And he totally understood the value of doing all that work is if you can actually mine insights from your data. But if you're at a big company with a lot of engineering resources, build versus buy is an interesting proposition. It's a no-brainer when you're starting a new e-commerce platform. So Wish started with us almost three years ago now and initially they were a small company, right? But now that they're the world's largest mobile shopping mall, they do a lot of number crunching and a lot of that happens on us. And it was interesting and it was very, it really validated our value prop that they could start small but grow as they'd need it. With treasure data, it really made my day when Danny told me that. What if you get drilled down a little bit, right? Because the world would probably say, we don't need another e-commerce platform, right? E-commerce been going on forever. So what was either the vision that he had or what he actually realized once he got to work and had the data that enabled him to use data in a way to build an application in a space that most would think is mature and actually take a market leading position. So there's a bunch of tools specifically tailored for like e-commerce people do the analytics. But a lot of these solutions do not allow them raw access to the data. So the way that Wish is structured is that they have their own homegrown A.B. testing system and they know a lot about that, like how to do A.B. testing with multiple like hypothesis at the same time. But in order to actually make that happen, they need to have a very, very robust data infrastructure that powers it, right? The analogy that I give is like, imagine like an Asperg, what people see is the end result, which is that only like 5% of what's actually happening like underneath. And we tried to be that like really robust part that not that many people actually see, but so that people like truly shine doing the 5% that other people can actually see. So put treasured data in context. We're here at Hadoop World. There's a lot of vendors out there on the floor. It's something you're trying to abstract away from a lot of that complexity that's inherent in deploying some of these technologies in your own data center, calling together different pieces from different vendors, et cetera. When you go out into the field and you're fighting tooth and nail for customers, I'm sure with all the other players out there, where do you find yourself going up against? Help us kind of understand what, lack of a better term, what bucket you tend to fall in. Are you and not at your own bucket, perhaps? Yeah. So one group of people is who has this perception that the cloud is too new or like they can do it themselves using their own like what I call like a Lego blocks of various data services. And what we notice is that it does work, but in many cases, but the opportunity cost of like not being able to utilize engineering resources or engineering talent, which all know by now is very, very scarce in this ecosystem today, is actually a huge loss for them. And by far like the customers that we see are true champions, are people who have sort of seen that in one way or another. And they came to the conclusion that if their main business is data, maybe it is worth the investment, but even then it's maybe. If their main business is outside of data, where data really empowers the product and business decisions, it's better to have something that is already built that suits their purposes. Mm-hmm. Well, it's interesting because we talk a lot about the kind of born data-driven startups and that's kind of some of the, Big 8 is born in their DNA, but then you've got this whole wide swath of companies in the enterprise, beyond those Fortune 1000 who are, they're kicking in tires with Hadoop and they're doing some things with Hadoop, but you've got this huge market opportunity with all these other companies that don't necessarily have the skills internally and don't want to be in the business of putting together these technologies where it sounds like that's kind of the market you're trying to address. Yeah, and also within like Fortune 1000, what we're seeing is that there are new projects where they really want to test out ideas quickly, but in order to like mobilize their internal resources, it's going to take a lot of time. So we work with the pioneer today for their new IoT initiatives. So what they wanted to do is that they want to go from just selling onboard diagnostic devices to actually making that data that is generated already by collecting it effectively and providing it in a considerable format to various third parties and themselves. And it was really eye-opening for me because I don't come from the telematics background and it's really amazing what kind of analysis can be done once you can actually collect all the data that is being generated inside a cart today. And not putting that up to data scientists and other analysts. Exactly. Talk about using cloud as a data collection repository from disparate and distributed sets of data because I'm sure a lot of people that might scare them, A, for security and those types of reasons, but also just potentially the scale. Talk about the trade-off of what using the cloud as your repository enables you to do with these disparate sources, disparate data sets. Yeah. So we do understand that, you know, different sovereignty has different rules that govern how data should be treated. And this is actually where our open source project called Fluendee has been really helpful. So Fluendee, you can think of it as like a logging middleware that allows you to stream data from point A to Z. But also, baked into its feature set is the filtering mechanism, which allows people to simply get rid of certain fields or encrypt it or hash it so that it's very hard to trace back to what the original value is. And this actually gives our customer the confidence that, okay, if we decide to go with treasure data, I can always keep the master copy on our premises and also do the filtering so that we can ensure, programmatically, that only the data that can be on the public cloud goes to treasure. Interesting. So I'm curious to get your perspective from me. As we talked about earlier, you've done a lot of different roles within treasure data. You guys are a fairly young company. And we have a lot of our viewers or entrepreneurs in the startup world as well. What are some of the lessons you've learned? Maybe some of the hard way, you know, in terms of building a company in the space. Talk to the customers. Talk to the customers. Our CTO cost comes from a deep Hadoop background. As a matter of fact, he's a Japanese native. If you type in his name, autocomplete sometimes is Hadoop. And we knew that we knew our crowd, right? But then we started to talk to customers or to prospects like why are you buying treasure data? What are the terms that you entered to Google? And it's really eye-opening every time. Sometimes they're not even looking for big data. They were looking for ways to store their logs more effectively. And they ran into like a Wikipedia page and that's how they landed on us. Sometimes we didn't even know that they tried us before but they went to a different company where the management is more receptive to the idea of using cloud. And so finally they're signing up and giving us money. And it's talk to the customers. It's lesson number one. I think it's a good lesson, Jeff. Yeah, yeah. The other kind of big trend that we know, another wave is coming and it's the internet of things and the industrial internet and now all these connected devices throwing off tons more data. Talk a little bit about your guy's point of view on it. I'm sure very excited about internet of things and some of the things you're doing to take advantage of that opportunity when it's coming just around the corner. Yeah, so the big challenge in the industrial internet of things is making sure that you have the mechanism to collect that data that is being generated. A lot of it is actually two things. One is the platforms are still pretty fragmented. It's not like Linux servers where you can just write for one platform and pretty much cover half the market. That's not how it works. The other thing is I think a lot of companies are still trying to figure out how they can utilize that data. But there's a lot of high level discussion but what really convinced Pioneer to go with us is that they have a very clear vision themselves as to how they can utilize that data that's being generated. I think that's going to be what the 2015 it really is for. And my somewhat far-fetched hypothesis is that they can also learn from the other kind of internet of things crowd which is like wearables and more newer companies that are coming from the entrepreneurial space. You said they had a pretty clear defined vision. Did that pan out? I'm just curious once they kind of, as they say, the perfect plan until the first shot is fired. Did it work out the way they thought? I don't know if I can talk about the timeline part but it's definitely going well. There's commits in our private repo happening every day. So just got time for one more question. So ask you what's on your roadmap to the extent you can share in terms of the next year? What's the top of mind for your major focus in the future? So I'm sure you guys already noticed this but there's been a lot more interest on what to do with the data that you're collecting. So a big data you're collecting then creating the platform to store and process big data just like last year. And once we have all the infrastructure pieces and services that allow people to focus on analysis, it's very important that the analysis that the data scientists and data analysts do actually make it back to production by operationalizing a lot of their findings. And this is also where we're really focusing on in 2015 and 2016 because at the end of the day, it is one thing to see the data as a ha, like we find something new. But the next really important step is let's actually implement something that uses our insights and automate it. And I'm really hopeful that it's gonna happen this year. I agree, I think that's for me that's where it starts to get exciting. You're actually doing stuff with all this data and impacting business, impacting society. We heard earlier some use cases around climate change. So there's a lot of things that I think are gonna happen this year and it's gonna be pretty exciting. So Kyoto, thanks so much for joining us on theCUBE. We appreciate it. Everybody, thanks for watching. Stick around, we'll be right back with our next segment here live at Big Data SV. After this.