 We're here with Daniel Avadi, who's an assistant professor at Yale and a co-founder, actually, of ADAPT. Thanks very much. Welcome. Good to see you. Nice to meet you. Welcome to theCUBE. Appreciate it. So third year of Hadoop World is kind of starting to grow up, isn't it? Yeah. I think this year, Simon is a whole other order of magnitude higher than it was last year, and last year was already very high. So certainly, you know, it's very clear that Hadoop is taking off, and it's a really funny part of it. So I understand that just from looking at your background, you did a dissertation on Columner, which led to the founding of Vertica. So you worked with Stonebreaker? I did, yeah. He's been on theCUBE? Michael Stonebreaker. Yeah, he's been, yeah, MIT. I worked with both Sam Madden and Mike Stonebreaker on database systems. So actually, when I started MIT, I was working on streaming database systems. There was a project called Aurora, another project called Borealis, and those actually became a company called Streambase. And then after that project was sort of winding down, around 2004, you know, a bunch of us at MIT decided to start this project called Seastore. You know, that was really led by Mike Stonebreaker. And Seastore, you know, after a while became commercialized to Vertica, which was sold last year at HP, and I think it was around February, I think it was actually February 14th this year was sold to HP. And then actually at the end of my time at MIT, I worked on a sort of high transactional, so distributed transactional database called HStore, and that became VoltiB. And VoltiB is actually here at the world, they have their own booth. So definitely, you know, being around Streambreaker all those years, you know, I learned so much. You're like a startup machine. You're like, just get your hands in there. Yeah, I mean, it's, you know, so once I, once I, which I saw, Streambreaker saw three companies while I was at MIT, you know. Hey, I can do this. Absolutely. All right, so let's talk about it. Did you start when you were 13, by the way? I know, I look very young, but I'm older than I look. You look young. It's great, but let's talk about that whole dynamic of, there are a lot of tech geeks, a new generation of, you know, MIT, you know, tech weenies, Berkeley, Stanford, Harvard, a lot of math, a lot of computer science. What's the thing that attracts the alpha geeks on this whole area? I mean, it's going mainstream now, so there's obvious reasons now. But back then, when you're doing all this, what was that one thing? Was it because you can solve hard problems? It was fun to play with? What was the kind of thing that got you going? Yeah, that's a really interesting question. I mean, I think I got into database systems actually when I was an undergraduate still. And, you know, to speak personally, I just found the research very interesting. So, I mean, ultimately, you know, it's very clear that data is becoming sort of the center of the world. So, you know, ultimately, you know, any sort of IT organization, you know, it's really around, like, how do you store and how do you process your data? So, you know, if you control the data, you really control the company, you control the business side of things. You really sort of are in the center of everything. So, you know, so therefore, you know, we can naturally actually want to focus on research on database systems. So, you know, I started as an undergraduate and then eventually went to MIT to work as a stone breaker in Madden. And then from there, you know, things just sort of, you know... A lot of these concepts are like combination of math, autonomous theory. So, it's a little bit of a blend of CS and math, right? Yeah, it's true. Yeah. I mean, historically, databases, you know, have a very good theory foundation. So, you know, from the time of Ted Codd, with the relational model, you know, I mean, that was a theory paper. But, you know, one thing certainly, and I think to go back to Peter's question, what really attracts alpha geeks is that, you know, yes, there's a bunch of theory, but you can also build something. You know, as part of my PhD, you know, I got to be involved in building three systems that actually, all three were released open source. And all three were used, you know, to some extent, either by other researchers or by the industry, before they were commercialized. And so I think that's, you know, I think the opportunity to build real systems is very attractive to, you know, to sort of a young student, you know, like myself, who's sort of thinking about what he wants to do, you know. I mean, it's nice to think about, I sorry, theory papers, but ultimately, you actually want to build something useful. And then you add demand for these systems, these new systems, and you have a market. Now you got a ton of venture capital pouring in. I mean, it's pretty awesome, right? I'll be honest, I had no idea the level of sort of the size of the market when I was, you know, thinking about going into a PhD program. I thought it was interesting. I thought it was pretty cool, but I didn't know it would be this way. I didn't know it was, you know, a multi-billion dollar industry. And I didn't even know what multi-billion dollar industry meant, actually. So, you know, it's good to have the blinders on, you know, from the whole outside influence. Right, absolutely. The claws of capitalism. You know, just go out and build some good stuff, right? That's right, that's right. So what's your advice to folks out there that are out there? I mean, obviously, you know, Cloudera just got another 40 million Hortonworks is bulking up with financing. Anything that moves that has a dupe on it is going to get funded. Excel partners has got 100 million. What's your advice to folks out there? Just ignore the money, just build a good product, and kind of then address it later? I mean, you've done it a few times. I mean, it depends who you're advising. You know, if you're advising students, you should definitely ignore the money. You should work what you find interesting, what you find passionate about, and you know, and otherwise, I mean, you know, make it through a PhD program and you're not interested in it, it's just not, it doesn't work. You get, you burn out very quickly. But once you graduate and now you want to start a company or you want to, you know, you want to join a startup, then you have to worry about money. I think you have to be aware of where the money's going and where the excitement is in the industry. So something like Hadoop, which has, as you mentioned, you know, something like probably, I would say about 100 million dollars of venture capital invested just in the last few months, at least, you know, since May. Yeah, I think, sort of, it's just so obvious that this is the place to be. I mean, if you want to do analytical database systems, you could try and do something outside of Hadoop, but you won't get very far. Right now, Hadoop is where the activity is. So any startup that you want to create really needs to have some sort of Hadoop focus. So, Daniel, what are you doing at Yale these days? How are you spending your time? You're an assistant professor there, and then you sort of double as chief scientist that adapts. So I want to go there, but tell me what's going on at Yale. Yeah, sure. So Yale, so we're sort of expanding our group. So there's two faculty who are part of the database group. There's myself and a guy named Avi Silverchat. So he actually wrote one of the most popular textbooks in the database industry. So he's a real legend in the field. He's been around for a long time. He came from Texas, and he was a VP at Bell Labs before joining Yale. So between the two of us, we have a very large lab. We have four PhD students. We have something like five, six undergraduates and a couple of other students who are sort of floating in and out. So there's a bunch of products that we have going on. So certainly HadoopDB, that was what Hedat was called before was commercialized, was a major project. We also have several other projects too, which I think are pretty interesting. So one project that we have is a project called Calvin, which is sort of looking at how to scale transactions. So it doesn't really fit so much in the Hadoop world, which is more focusing on data processing and analytics, but there's still one key problem, especially in the NoSQL world, is how do you issue transactions across a thousand machines and scale that up? Today, the key value stores don't really support transactions. If you look at HBase, if you look at Cassandra, really any of the popular NoSQL systems, they allow a ton of corporations on individual keys, but they don't really scale those operations across thousands of nodes, at least not the transaction itself. You can have individual updates in thousands of nodes. You can't make sure that all happens automatically. So one project that we have going on at Yale is sort of trying to fix that problem, and we have one paper ready on that, and we're sort of in the middle of working another one. I think that's a pretty cool project. Another project which I'm actually going to talk about today here at Hadoop World, I think it's at 3.30 p.m., is sort of a project on a graph database system. So how do you... We basically figured out the relational model, and I think we sort of know how to build data warehouse, we know how to do data processing and scale relational data, and Hadoop is great for unstructured data. But now we have graph data. Graph data is becoming very popular. We have such networks. We have telecoms, companies. We have linked data, which is like semantic web. So there's all kinds of data which are now sort of best represented as graphs. So it seems like one key sort of research through us that's going to have to happen in the database industry is sort of being able to figure out database systems for graph data. So that's, you know, I have a student, John Wang, who's doing a special thesis on that. I'm going to present some of that work that we've done together today at Hadoop World. So that would be pretty cool as well. So there's a term called sub-graph pattern matching. That's right. We know about social graphs, John. Daniel, tell us about sub-graph pattern matching. What is that and why does the world need it? Sure. Yeah. I mean, I think this is sort of a very common operation in the database.