 You wouldn't ask me anything specific about Twitter, how many employees they have, business model, revenue, new CEO, all the stuff. It's funny we're going to talk about the future of the world. We've already covered all that. Today it's only Hadoop and Hadoop World. So you gave a talk, and I heard it's pretty well attended. Yeah, hopefully. How'd it go? I think it went well. I think it went well. It was a short format here. They have so many people, and they're doing all in one day. But I think it's been amazing to see the turnout, honestly. So we were broadcasting live during your talk, and we didn't have an opportunity to hear it. So can you give us the bumper sticker version? Highlights? Yeah, sure. The basic premise was everyone's using Hadoop. And I think the fact that there are 1,000 people here today attending the Hadoop conference shows how powerful Hadoop can be. But a lot of the ecosystem around Hadoop is where there still needs to be a lot of innovation. So how do people get data into Hadoop? How do they store data in Hadoop? And how do they basically integrate Hadoop into their larger ecosystem, into their workflow? Because generally, you have an online site that you're managing as well. And Hadoop is just a part of what you do. And so what I was talking about with some of the tools that we either use that are open source or that we've developed ourselves and open sourced, and how those fit into our Hadoop cluster and why that's useful and how other people can use those. So those tools you developed yourself, you put back into the open source community? Yeah, Twitter. We try and do that as much as we possibly can in Twitter. The company believes in that. All the engineers believe in that. And it actually makes sort of a great place to work. You guys do keep some, though, inside in-house? Some stuff we can't, yeah. So the two big things we're hearing about Hadoop that's really beneficial is the notion of analytics, getting answers to questions that previously were impossible. And then really real-time data. And so that seems to be the two common things we're hearing. How do you view Hadoop from that perspective? You need analytics to drive your business, but you also have real-time, obviously Twitter is real-time, but I mean, Hadoop has to address this. What makes Hadoop so special that hits those two hard problems? Analytics and real-time access to data. What makes Hadoop so special? I mean, most of our uses for Hadoop are big data batch-oriented. They're not real-time. Hadoop itself wasn't built to do real-time, and so it doesn't specialize in real-time data. If you're okay with your data taking three or four minutes to run a job, Hadoop is amazing. But as far as an actual real-time system, you've got HBase, which was built on top of Hadoop to be the real-time component, to be a big table clone. We're starting to use that a little bit, but by and large, our real-time stuff is done outside of Hadoop. So when you build a timeline on Twitter, when I show you the most recent tweets from you or from you, that's not done on Hadoop. It's done in MySQL or other systems that are built to be low latency. But you guys have to do a lot of analysis. Three minutes is not a long time to wait. No, it's not a long time. That's a great response time. For a lot of data, yeah. For heavy lifting data? Exactly. That's pretty big. How much data do you guys use in your Hadoop cluster? Can you share in terms of size and how big that Hadoop is? The number I can share is that we have 12 terabytes of data coming in every day. Every day? Every day. So we have, I mean, Hadoop is a big part of our ecosystem. And it's great that you guys are contributing back. What tools did you mention in your talk that you guys are using? We actually use a pretty broad range of tools. So we use Scribe, which was open sourced by Facebook, to get data into Hadoop. We've open sourced a tool called Ellipant Bird, which is what we use to store data and to read data in and out. And then we also are heavy users of PIG and are beginning to get into H-Face more and more and even a little bit of Hive. So we're kind of across the spectrum. And we're also fortunate to employ committers on all of those projects. So we have a Hadoop committer. We have a Hive committer. We have a PIG committer, which means we can actually contribute back effectively to all of the projects as we use them. So if I wanted to measure if my brand, there's a positive sentiment around my brand, would I do that through some kind of Hadoop platform or would that be some of your real-time stuff you mentioned in my sequel? Can you help us understand that? I mean, is that even a legitimate question to be asking today or is that even future? I mean, sentiment work is challenging in general. I think everyone will admit that. It's certainly a case where Hadoop would be applicable. It's also something that we've seen, Twitter has an API and a platform and we have a lot of third-party developers out using our API. That's one of the big things that we've seen crop up around the API. We've seen third-party developers building semantic analysis platforms and tools to work with businesses and with brands. Right, so trying to get a, so you might go in and see if somebody's popular or if my brand is trending. So Hadoop, you're saying may or may not play? I would imagine a bunch of those companies used Hadoop. Maybe not all of them. It depends on sort of what they're trying to do, how big their data is. Kevin, what's your background? And after telling us a little bit about your background, talk about the kind of people that Hadoop's attracting. I mean, it seems to be what I would call a class A entrepreneur, engineer type, but not all super geeks. I mean, a lot of statistical guys, a lot of math, a lot of computer science, some AI, seeing a cross-section of discipline. It's not the classic network guys. So it just seems to be a new breed of engineer scientists. Yeah, my background's actually in science. I was in physics and in math before. Dropped out of a PhD in physics to join the startup world. Just like Michael Olson twice, he dropped out. Yeah, right, exactly. I think there's a whole subculture of it. I'm dropping out of whatever I'm doing now. There's a whole subculture of PhD dropouts, I think. Actually, I'm going to go to Stanford, get a PhD, and then drop out the first day. Just to say I dropped out. I dropped out of Stanford. Exactly. But I think a lot of people started, like me, where I was working at a company called Tropos Networks, analyzing municipal network data. That's the mesh networking, right? Yeah, mesh networking, exactly. So we had gigabytes of data, and I was writing Perl scripts to analyze it. And when I moved to my next company, Cool Iris, we suddenly had terabytes of data. And those same Perl script-based tools were not working anymore. And so I think the reason you're seeing a lot of statisticians and mathematicians get involved is these people have worked on smaller data sets. They know how to do that. Suddenly, they're faced with larger and larger data sets. And Hadoop is one of the best tools out there to do it. I mean, it's probably the best. So you're seeing a computing guy, software guys, and gals. I mean, is there a pattern? I mean, anecdotally, are you seeing any kind of pattern in terms of the kind of profile person? I think it starts being super smart and math oriented. I think it starts with analysts, people who are in the data space. But then as Hadoop gets more broadly used, as Claudero becomes more and more successful at making Hadoop easier to use, because they're doing a great job of taking away the hard edges, making it easier for enterprises to use, for random developers to pick up and start playing with. You'll see more and more software engineers get involved. We actually have product marketing people at Twitter who use Hadoop every day. People who can hardly write code. So we've taught to use the cluster. Did you do a front end for it? Did you just build a front end? They actually are in the total SSH. They're not coding, but they're running queries. Yeah, which has been awesome to see, right? That's a great culture. Directly into a Duke platform, not through some kind of big sheets interface. Yeah, no, no, they're real. So when Eves, when everyone picks up a weapon in works, I mean, literally, you guys are growing so fast, you had product marketing people basically in terminal mode. How good is that? You don't see that every day. They're up to their elbows. It's like you guys don't have a parking lot full of cars at 12 midnight, you have guys on terminal. So we were just talking with James Phillips, co-founder of Membase, which was NorthScale. And they are powering Farmville. I see they were just reporting as overtaking Twitter in terms of the most real estate space in San Francisco, but Zynga's growing like crazy. So he was talking about the scale of operations. And then obviously you're dealing with that every day. You're talking about Perl scripts to massive data pools. What's it like and what mindset do people need to have and what advice can you share, people? Because you guys are the leader in terms of drinking from the fire hose, so to speak, no pun intended. Like you're operating at scale and you're growing and you're having to re-architect and do all kinds of new stuff at massive scale. And that's so serious. I mean, you know mistakes and you can't make a mistake. And Farmville, same thing. One little mistake on latency, they lose millions of dollars. Yeah, I mean, if you guys, you can't go down. Yeah, when you're growing so fast, one of the things that I try and talk about when I am up there is I think it's actually one of the great things that Hadoop promises. And there are numerous challenges to growing this fast. And one of them is actually serving the online site at low latency. That's one challenge. I deal more with the analytics side of the challenges, which are your data is growing at immense rates and you've got to be able to analyze it, get useful things out of it that you can then turn around and use to help the business, to help the product teams. And one of the things that Hadoop really brings to that is its ability to be scale free. So if your data goes up by a factor of 10, as long as you're able to scale out your hardware, your code doesn't need to change, which means you don't need to go back and rewrite and redo all the work that you've done. I think it's one of the first times that that's really been true. Suddenly you can literally throw hardware at the problem. And it's meant that we've been able to continue innovating on the analytics side rather than having to go back and redo stuff that we already did because we have 10 times more data than we used to. We sit in the CloudAir office, SiliconANGLE, that's our office space in our studio at Palo Alto, so we see the guys all the time. And one of the hallway conversations that are always going on amongst the supergeeks there are this notion of scaling out. I mean, scaling up and scaling deep, throw hardware out, you've got great tools, but essentially if you're spreading data around, say in low cost clusters, you essentially have new issues, right? Scaling out is not a trivial thing. Do you have any opinions on the scaling out challenge and points of life that are positive for people out there? Because there are some benefits, but as you get farther out, you've got around trip time, you've got both types of issues. Do you have an opinion or an angle on that? I guess in some sense it is. I'm not on Twitter, I'm just a personal angle. I think it's, the situation is probably much better than it used to be. If you had the same problem 10 years ago, you would have been building all of your own stuff from scratch. I mean, Google built all of their own stuff from scratch. Now, with the open source community growing, there are more solutions out there, and you've got people like Google who did it, and even if they didn't open source a lot, they at least talked about how they did it, which led to people like Doug cutting, starting to build things and open source them and make it possible for everyone else. You read a paper on Google file systems and all. Yeah, exactly, right? As long as you've got brilliant guys like Doug around who will build these things based off of a paper from Google. And then, you know, Facebook has done a pretty good job of open sourcing some of the tools that they've built that have helped them scale. We've tried to, as we've grown, do the same and open source as much as we possibly can so that companies that come behind us don't have to deal with some of this stuff anymore. Like, we want, I think in the ideal world, all of the basic backend scaling kind of commonalities are open sourced and companies get to innovate in their particular domain, but they don't have to reinvent the entire stack every single time. It'll make everybody work faster. We're here at siliconangle.com and siliconangle.tv, live from the Hilton in New York City with Kevin Wheel from Twitter talking about Hadoop. Great scene here. Just describe for the folks out there who are watching who aren't here. They're seeing a lot of tweets. It's been a heavy Twitter stream day today out there and a lot of comments for a positive. Share with people to vibe here. What's your take? How would you share with them the scene here? I mean, the movement is happening. What's it like? It's pretty fantastic, to be honest. I have friends with a lot of the Cloudera folks, so I've watched the company grow and to imagine that they have 1,000 people here from New York for a single day, it's really impressive. And the conference is well run. It's efficient. People are sticking to their talks. I've heard a lot of positive vibe. It's been fantastic. All right, all right. Good, well, listen, thanks very much for coming on theCUBE and sharing some of your thoughts on Hadoop. We'd love to have you back in Palo Alto here in the Cloudera area. Yeah, great. You guys do have a studio in the back. Appreciate everything you're doing for the open source community. It's great philosophy, paving the way for future entrepreneurs. We love it. Thank you guys for having me on. Thanks for having me. Thanks for coming on. We kept our promise. No Twitter, as soon as you leave, I'll start talking about it.