 Welcome back. I'm Jeff Kelly with Wikibon. We're here live at the Cassandra Summit in Santa Clara, winding up what's been a really long but really fascinating day, kind of getting to know the Cassandra community and talk to a lot of really interesting people. And we've got yet another interesting guest on with us today in this segment, Patrick McFadden, Chief Architect at Hobson's. Welcome. How are you doing? Good. Welcome to the Cube. I'm the last one. Well, we say the best for last. So why don't we start off with a little bit of information about yourself and your background and kind of what Hobson's does. You're in the education services and software field. Yeah. I mean, so Hobson's is an education services company, K-12, higher education. We, I mean, what we're trying to do is we're trying to help universities. We're trying to help K-12 institutions find and retain students. We do things like college applications, which is really one of our more important missions because that's a pretty critical part of a student life cycle. I think like retention and trying to keep students in school, which is a big problem right now, a lot of recruitment. So all of the things that have to do with student life cycle, we're pretty much involved with. So you spoke here today, a couple of different talks. You and I were speaking just before we went on air and you mentioned kind of a lot of interest in the talk you were giving about kind of developing on Tavel Cassandra. Yeah. And so talk about that. I mean, what's, what's, what do you think, why the interest is so, is so strong there? And what did you focus on in your talk? Yeah. So speaking of education, yeah, I had to break up my professor hat. We were doing a, I did a talk on developing applications from scratch with Cassandra. So it was really kind of starting from beginning and trying to take that application concept through its life cycle. So the conceptualizing it, doing the data model and then deploying it out. And it was in a 30 minute talk. So of course I was talking fast, but it was more of an overview and I was really just amazed. It was standing or mowing. I really wasn't prepared for the amount of interest. I thought that this would be a conference where we'd have a lot of more veterans, but it was clear. I mean, with the size of this, this year is double from last year that there was a lot of interest for just how do I even start with it? And so I, after the talk, I got a lot of people asking me questions and I can tell that a lot of people showed up here trying to figure out how am I going to use Cassandra. Right. What does, what does that say about the community and kind of the state of Cassandra as a technology when you're, you know, you're getting so much interest kind of people. You know, we've talked to others today about, you know, the 10 needs and they seem to be really, these are serious smart people who are ready to really to take some action. They're not here just kind of, you know, thinking about big data. They know what they want to do. And now we're talking about, you know, listening to folks like yourself talk about developing applications, really deploying them in production. So, you know, so what are some of the, I guess, best practices that you shared with the group? And, you know, what does that tell you about the state of the community? Well, the state of the community is growing rapidly. I mean, it's gone beyond this early adopter phase. We're out of that. And, you know, there's large customers that are, that can talk about Cassandra and validate it. Netflix is pretty, you talked to Adrian earlier. So, you know, everyone's as well, I guess, if they're doing it, it's okay. So, you know, the early adopters like me and then others that have been in this community since 0.7 or before. We're now talking to people that have, are looking at it the first time with some sort of appreciation, which is good because we don't have to say it's going to work now. Now we're saying, okay, this is how you make it work. And so, I mean, the state of the community now is really cool because now we're getting a lot of churned people are moving from company to company to have experience and so it's starting to propagate out. So, I think this is a great place to be right now. So, talk about the actual developing applications. I mean, what are some of the key challenges specifically focused around trying to build applications on the topic of sound growth? Well, right now the biggest problem that I personally had with, you know, when I talk to developers is just getting over that relational model. So, most of us who have been educated in, you know, classic computer science classes where we take a data-driven application, it's always been on something like Oracle or MySQL or Postgres. And so, the data model has always been a relational model. Very rarely has it been anything else. And I don't think that that's really even being taught in curriculum right now. So, it's just getting past that hump. And it really isn't that big of a deal. I think that more people see it, they're like, oh, okay, it's not so bad. So, okay, we're just storing data. There is some nuance and some twists to it, but it comes with what you're getting with the deal. You know, Cassandra is going to be available all the time and hopefully. And you're trading off that eventual consistency and all the things that come with it that are good for maybe some things that you were used to in your relational world, like relational integrity and transactions and things like that. So, you get past that and it actually starts going pretty fast after that. So, talk a little bit about your experience at Hobson's. So, you know, did you guys start building your applications right on top of Cassandra to start? Or did you move off of another database when you hit some scaling issues or performance issues? Walk us through the life cycle. Well, I mean, it's a problem that I think a lot of people have as they run into a performance brick wall. We had a system that was collecting, it was running on Oracle, collecting our performance tracking data and our web log data. And it just wouldn't scale. And we got to the brick wall. We're either going to have to call Oracle and buy more of it. Which is exactly what they, that's their business model. Yeah, that's right. The Oracle scale is great until you run out of money. And then if you had unlimited funds, it would be no problem. Oh, yeah, sure. I mean, that's how we got to the moon, right? So, what we were looking at is an alternative. And MySQL is an option, but I wanted to use something that would scale a little more in a linear fashion. And so I had already known enough about Cassandra where I felt like that was a really good choice, especially if we're going to try to scale this thing without any knowledge of how this is going to go. I mean, it's going to go to X. And what X is, I don't know. So we needed something that would scale horizontally like that pretty easily. So we started pretty earnestly last summer. We transferred the whole system and getting rid of Oracle to do that was a really good idea. Because we just don't have a problem with it scaling anymore. It's running on Amazon and runs great. So I think it proved the point pretty quickly. So let's talk a little bit more about use cases is kind of what you guys are doing in terms of education services. And you know, we talked a little bit about, you know, applying big data to some of these societal issues. Education being one of them we talked earlier with health care any time talking about kind of applying big data to health care. Education is another one of those use cases, one of those areas that's really, you know, we hear a lot about in the press about failing schools and how we need to do better. How are you guys actually trying to make that happen by applying big data? This Yeah, this is a current problem we're trying to deal with because I mean, when it comes down to a data drive so much, but in education, it's so important because there's so much meaning you can get out of data now, how students are progressing, for instance, retention is a huge problem. Right. Universities are, you know, it's that old thing of when you sit at the freshman day and you look to their left or look to the right, those people won't be there when you leave. That's the problem that you have to solve. And so data, the big data aspect of this is we're collecting a lot of data. How do you find those students that are potentially failing? And, you know, that that's not a bad mission to have. If you have a 50 or 60% retention rate, which is not uncommon, and you can bump that up two or 3% or even 10%. Those are people that kids that are finishing college and having an education, there are other challenges that they have, like they're going to be strapped with some debt. But it's still a big data problem in its core because we're collecting a lot of data about students. Right, let's dig into that. So what are the data sources that you're talking about? And how has that changed over the last five, 10 years, where we couldn't, you know, before the advent of what we're calling big data, you couldn't do some of these things. So what are the sources and what are some of the possibilities now that maybe we couldn't we couldn't do in previous generations? Well, what we know for sure is that students now, especially, are really active in social media. And they love to talk about what they're doing. It's amazing how much information we have a website collegeconfidential.com. That is, I'm always amazed by it. And what the mission of that website is really just to have a place where kids can talk about getting into universities of various kinds. So like a specific that maybe they want to get into Harvard or Stanford or something like that. They talk to other people and they they compare notes. And it's really an interesting place because there's so much data in that. And just mining that alone, we've looked at that as real, that's really interesting because you're going to find out that everybody's on the same track and you get some really interesting insights. We I mean, I just go on there and without even using search tools and looking at the forums and seeing what people are talking about or trending. And this really gives you an insight of what's on the minds of kids at that time. So even taking like traditional data like grades and things like that that we've always collected data on. We've never done any kind of data mining on that. We should. And I think there are some startups now that are doing that. And I think that's one of the exciting things is that education has become a really it's just a focal point for a lot of startups, a lot of big data startups. It's just a strata. There's tons of startups that are doing education based stuff. Yeah, there's a lot of opportunity, isn't there? Because there's just as you mentioned, there's so much data, social media data, especially the generation coming up now, the millennials, like you should call them, you know, are much more comfortable sharing with social media. And there's so many new avenues now for you to kind of dig into, especially when it comes to education and dealing with kind of generation. It's really interesting insights. I think that we can get in the future, too, is if we just ask questions. I've one of the things I've always been amazed by is when you're when a student is in a cycle of getting ready to go to university, the amount of questions you can ask them is so much more than say if you're signing up for an account of Twitter, you know, I mean, you can't ask things like religious affiliation on your Twitter account, but I mean, it's a really, there's so much data we get. And some of it's very relevant to the university in hand, but there's just a massive amount of data. I mean, pages of applications. I mean, think about college application. That's literally pages of data. And it's not that we're using it for a bad thing. This is used for their betterment. And that's, I think that's really the key. So let's take a little bit more into your services, what you guys are delivering. And specifically, how are you leveraging Cassandra to kind of serve up, I'm imagining, some type of real-time application that you support, that you're sporting with Cassandra. Talk a little bit about what you guys are doing. Yeah, it's not, it's not on student side yet. We haven't started doing it on student side yet. Really, it's for our own back end that we replace this with. But what it does give us is that real-time look at all of our applications and URLs. How are they doing? How many people are using them, that sort of thing. And that's pretty important stuff too. We do most of our traffic in the last couple of days of the year, during application decision, the application time. That, if you had it, we had it on a 24-hour batch. If you do that on batch, you're missing a whole day of data, which is like half of the most important day of your entire cycle. Yeah, we've only got two critical days like that. You really, yeah, you can't. A 24-hour cycle is too long. No, no. And that's what we had for a long time. And so we need to have that real-time, we have a dashboard set up, and it just pulls the data real-time out of Cassandra, and you can see the nice fluctuations of things. And it's always, when we have the deadline for our applications, which is, you know, on January 1st, it closes at, I think, 9 p.m. at specific time. Everybody's watching the computer, and you just see this ramp, ramp, ramp, and it just drops off. And that's now being graphed in real time. And we could see, if we were having trouble, we could see that pretty easily. And that is so important, because if you think about what our mission is, what we're trying to provide is a good experience. I mean, think of the parents sitting there watching their kid fill out an application, and they get a, bam, a 503 error. What's that? I mean, it's like their whole life just went boof, and that's the deadline. So we have to make sure that it's ready. It's our mission. So, final question, just we'd love to get your impressions about what you're seeing, and not just the Cassandra community, but the whole big data community. You know, we're seeing there's kind of all these competing, no-sequel approaches. You know, there's the Hadoop world, but that's, you know, more focused on batch, and you've got Cassandra supporting more real-time types of analytics. Just what you take on, you know, the last couple of years is we've seen this movement kind of ebb and flow, and kind of different horses on the track kind of take the lead, and Cassandra seems to be gaining some momentum now, and H-Base maybe was a little bit hotter, a little, you know, six months ago. Talk to me a little bit about what you're seeing in the community, and what that means to people like you, and the job that you do. Well, there's like some phases that we've gotten over, obviously. It used to be, how do you collect that much data? Check that box. Done. Got that. We can collect a lot of data. And then it was like, how do we analyze that much data? And I think we've, I think that's a solved problem. I believe that's a solved problem. It's now the nuance of how you do it, and you know, there's still a little bit of work there. Plenty of churn in that, but now I think the transition where we're heading is, now what do you do with it? And last year, it was all about how do you compress and crunch this much data. Now I'm seeing more visualizations. I'm seeing more dashboard type stuff. People are talking about the use of data, and that's really exciting, because it's like we got rid of the hard part. You know, the collection, the crunching. Okay, got that. Now, let's do something with it. And we're here. I think we're here. Kind of the creative part is here. Now, you know, some of the heavy lifting is over now. Let's really start to get creative and see what we can do with all this infrastructure we've built. Yeah, you hear the word democratization of data. And I don't think that before, you know, last year even really, it was that much democratization. I can't imagine everybody having access to data that easy. And now I think it would be iTools as they're maturing on top of these solutions. You're getting that now. Real time. So, you know, as we wrap up, so let's talk a little bit about what you guys are doing in terms of the future and the next year or two years. What are you guys doing around big data? What kind of innovations are you looking to build? What are kind of the biggest needs when you're talking to your customers and your users out there? Hey, this is the kind of thing we'd like to see. It's really, it's again the use of that data. I think that's, we're trying to figure out the best use of that data. That's going to be the most effective and we collect a lot of data in what's really unique about Hobsons is we have the K-12 all the way through HE, the higher education. So we can, you know, we can look at a progression over time and that's really interesting insight that we have. And that's, we are talking about that now. How are we going to use that for student success? To help our universities get the best students that they can and help them find students that may not be found that are lost, you know, so to speak. So that's, that's exciting. We're doing that now. It's very interesting stuff. Really cool. And a very, you know, very worthy, worthy goal as you mentioned. I can't imagine a better goal. Well, thanks Patrick so much for joining us. Really appreciate it. Hope you enjoyed your first time inside the cube. Hopefully we'll have you back. That is, it was great. All right, fantastic. So we'll be right back to wrap up the day here from Cassandra Summit 2012.