 Okay, we're back live at Strata day three for us. Eight four hours on the first day, eight hours live on the second day, eight hours today. Can make them even go longer. Strata conference is kind of ending today, but we are here live with all the coverage. This is SiliconANGLE.com, SiliconANGLE.tv's flagship telecast, we go out to the top tech events and we extract a signal from the noise. We go deep, we talk to the smartest people we can find who have something to say and extract that signal from the noise and share that with you. And obviously data is big and Twitter data, social data is huge. And we're going to talk about that in this segment. I'm John Furrier, the founder of SiliconANGLE.com and my co-host. I'm here with, I'm here with, I am Dave Vellante of Wikibon.org and we're here with Nathan Mars, who's a lead engineer at Twitter. Welcome Nathan. Thank you guys for having me. Thank you for coming on. Former back type, got acquired recently. So congratulations on that. Thank you, yeah. Fantastic and we're going to talk a little about Storm, right? Yeah, let's chat. So are you developed Storm or? Yeah, so basically when I was at back type, we had, we were doing real-time analytics on Twitter actually and it's a pretty hard problem. So we developed this infrastructure Storm to solve this problem for us and that was in development for like seven months and then we got acquired and now we're doing it at Twitter. Awesome. Well, why don't you tell us a little bit about Storm and then we can get into it a little bit. Sure. So I like to say that Storm does for real-time processing what Hadoop did for batch processing. It exposes a set of general primitives for real-time processing and lets you compose them together to do very sophisticated and fully fault tolerant, real-time data processing workflows. Awesome. So you said you were in development seven months? Seven months before we got acquired and since then I've continued to develop. Yeah, so but prior to getting acquired, so where were you in terms of actually deploying the platform and how was it getting adopted and by whom? Yeah, so well we didn't open source it until we got to Twitter. Okay. But it's actually gone through multiple iterations. So we had a first iteration that took a few months that was useful for us for various things and then we saw oh it needs some more features to do these other use cases and it's expanded since then and it's continued to expand after acquisition even now. I just released a really cool feature recently called transactional topologies which enable fully fault tolerant exactly once processing semantics which is really powerful and I'm really excited to see what other people do with it. Awesome, yeah so it's funny, we're talking before we're going on it was Kevin Weil we had on, I don't know if you know Kevin he was on a couple years ago and he was talking about some of the limitations of Hadoop and real-timeness and so obviously you're bringing that to the platform. Yes. And so talk about that a little bit. Yeah so I actually have used Storm at Hadoop as very complimentary actually. So with Hadoop is about batch processing and extracting insight from all of your data at once. Whereas Storm is about looking at data as it comes in and trying to extract as much value as you can in real-time. So the applications we build actually use a combination of some Hadoop workflows as well as some Storm processing and basically what that lets us do is well as soon as we get a piece of data we can extract as much value as we can through Storm, right? But then once we've had the data for let's say an hour we can extract much more value out of that through our Hadoop workflows. So you guys Twitter data has been a fantastic developer playground over the years. You got a lot of data obviously Twitter's massive. Twitter has a real-time impact that's changing the world. We've seen all the great things from Twitter that you guys have done and the metadata is phenomenal. So what are the challenges in real-time as this story here is predictive analytics, real-time analytics, cloud mobile social is the top themes everywhere. What are the biggest challenges that you're finding with rolling out Storm and within Twitter? I mean you guys are constantly scaling every day. We talked with Scott Detson who was from Pure Storage and obviously they want to go everything flash. He's talking about his old colleagues at WebLogic are all kind of over there at Twitter now. It's a systems company but you got to bring the modern era here of systems computing, operating systems for the CS jocks out there but a new breed of technical chops. So can you just share with us the mindset of this new breed of systems guys plus the modern era of what real-time does for some of the developers and comp side guys? So I think the biggest challenge with data processing which actually gets even harder with real-time is just fault tolerance. How do you make sure that things keep on running if like a machine dies or there's some exception or whatever? How do you make sure your results stay accurate and correct? Given that anything can fail at any time. That's the biggest challenge and I think the breakthrough to Storm was figuring out how do you do fault tolerance in real-time in a way that's not complex and reasonable. But it's actually with respect to fault tolerance there's actually two kinds of fault tolerance that I see. There's the typical fault tolerance that people talk about which there's some technical error. Like you lose a machine or something goes wrong. But there's another kind of fault tolerance which I actually think is more important that people don't talk about enough which is what I call human fault tolerance. If there's one thing that we've learned from software development over the years it's that humans make mistakes sometimes. Sometimes you actually have buggy code deployed to production which can cause vast damage sometimes, right? And the way I like to build systems is just to assume that these mistakes will be made and then engineer systems in a way so that you can limit the impact and recover from that impact. And that's been a major focus for us in making our systems reliable. So my friend Andy Kessler just wrote an article for the Wall Street Journal, an op-dead piece called Social, when will the social media elect the next president? Because obviously Twitter is a big part of the social media kind of sentiment and opinion forming rapid open sourcing of data through individuals. How much pressure does that put on the team to know that identity and trust are big issues? You guys deal with spam all the time, I know. Constantly struggle at Twitter to kind of make sure you get the spam bots out there. How is your work, Storm, handling the tsunami of false data or spam data and with the demand for this kind of pressure in society to bring things like elections? We saw the Egypt thing really be really game changing globally in society. So talk about that dynamic, can you? Yeah, yeah, I mean, traditionally, right, these problems were very hard because you didn't really have infrastructure that could do this. So everyone was hand-rolling their own infrastructure. And actually when we got to Twitter, what we found is that Twitter essentially had 10 half implemented, barely working versions of Storm internally, one for every specific use case they had. So one thing that Storm lets you do is unify these problems, let your infrastructure take care of the hard stuff of fault tolerance and paralyzing things and all that stuff and let you focus on the higher level stuff, your applications, your algorithms, your spam detection, your machine learning, instead of having to constantly worry about the infrastructure, right? By providing general solutions for infrastructure, that makes it much easier to actually focus on these applications, which is what people actually care about. What's the biggest request that you guys are getting on your team around internally at Twitter and from the outside? What pressure is causing you guys to innovate faster? Kind of an extra chair question, but trying to get at some of the innovations. Yeah, I mean, certainly scale is one of the biggest drivers of innovation. If we can't scale, we don't have a product, right? And certainly, very few companies have the kind of, require the kind of real time scale that Twitter does, which is what has driven a lot of the innovation in Storm and related projects. So what's next for Storm for you guys as you guys evolve further? What's the next chapter? So, within Twitter, we're building new applications on Storm, enabling things we couldn't do before because of the things that Storm provides and as well as migrating existing infrastructure, some of those half broken things I was talking about onto Storm so that they're more reliable and more scalable. Otherwise, we take things on a use case by use case basis. If there's a use case we can't satisfy now, that will drive more feature development in Storm itself. Well, we're going to bring in Tim O'Reilly in a second, but I want to ask kind of one final question. What's it like at Twitter right now? You came from a startup and startups are fun, engaging, you know, Rolla Coaster, you know all those things they talk about startups. Twitter is a different kind of Rolla Coaster, pressure cooker, but huge technical challenges. What's the vibe like at Twitter? Obviously you guys are always hiring, we know that and what's it like there right now? Chaotic? But it's actually, it's a lot of fun. I mean, it certainly was a transition for us going from a four person team to a 800, I don't even know how many people work there, like 800, 900 person team. But it's been like a lot of fun. There's lots of really cool data to work with. Like I think the Twitter data sets are probably the most interesting data set in the world. And Twitter has a very open, yeah. We love Twitter. Twitter has a very open, transparent culture. Twitter is very committed to open source, which really vibes with me personally, as well as I think most developers, which makes it really fun to not just work on the Twitter problems, but get to interact with the community. Storm has had a very large community form around it. It's been a lot of fun to see what do they do with our technology and so on. Well Nathan, you know, we've had you guys on Kevinon and Hadoop World 2010. You guys done a lot of open source work. You're one of the prime examples of Hadoop early on. You guys at Twitter. We're open source content here at SiliconANGLE and Wikibon, our research team. I want to congratulate you on the Twitter team for all your work. Congratulations, keep it up. Twitter is leading the charge and innovating kind of systems, architecture and computing with real time, predictive analytics. It's going to be the future. It's going to change elections. It's going to change public interaction. All this stuff is going to continue to work and good luck with everything. Thanks for coming on theCUBE. Absolutely, thanks for having me. Okay.