 Why 5th elephant? Hadoop? Yeah, so let's... What would the 4th elephant be? The first 4? Who is Grace? Who else uses an elephant as logo? I would not... Cassandra? Cassandra? I would not... Do they use an elephant logo? No. Okay, so that's actually the reason. The 5th elephant is named after a book by any project. Who's read anything about it by any project? Yeah, okay. Who's even heard of it? Okay, so a few of you guys have heard... If you've never read Terry Pratchett, you're missing something really major in your life. So go read Terry Pratchett. Terry Pratchett wrote a series of books called The Discworld, which is essentially a parody of mythology. He's taken pieces of mythology from countries around the world and essentially turned them into a parody of themselves. So the 5th elephant is named after the fact that the world is flat and is supported on the backs of 4 elephants, which stand on the back of a turtle and its turtle flies through space. This is what mythology has taught you, remember? Chilling the ocean and what not. It's all about turtles and elephants and what not. So the 5th elephant is the missing one. This legend of the 5th one that's missing. Nobody knows where it is. Did this elephant fall off the turtle? Did it go crash on back into the world? Whatever else. So there's a book called The 5th Elephant which is entirely about this. It's saying where did this 5th elephant disappear to? Where did it crash? In our understanding, we just look at data as the 5th elephant. You can't see it. You can't see anything sensible in it if you just look at it in its raw form. But there is something hidden in there. It's about that. There's an elephant sitting somewhere in your data. It's about how you find it. So that's the reason for the name. So a quick round of introductions. I'm Kiran. I'm one of the guys from Hasgeek along with the rest of the crew here. So I'm going to pass the mic around and just quickly introduce yourself. And if you can describe your project idea in one sentence, do it. I'm Kamal. I'm taking up the drug side effects project. It's basically a data set on all the adverse side effects reported in the US FDA. So the data is there. The data sets are there. I'll have to start working on that. I'm Anand. I haven't decided what data set I'll be working on. I've proposed a fair number, I guess. I'll just be taking up the one that most people need to help on. I'm Arjun. And I have no idea about big data. But I want to work on that company's data set. Hi, guys. I'm Resan Risoza. And I'm a Ruby developer at SourceBits. And I'm planning to tackle the IMDb and Twitter data set problems. My name is Sajat. I'm part of the Hasgeek crew. I mean, it's been, like, the hacknet's been one of our dreams. Like, for the fifth elephant, at least, we tried to, like, put together the hacknet in a proper structure. So I'll be around this shout-out. I don't know whether I'll be able to work on one of the projects, but yeah. Hi. I'm Moves. I'm a researcher. I did my Ph.D. from the University of Zurich. I worked on geographic information retrieval. What you see is that today is Google Earth. My interests are into geosematic information retrieval. I've been a professor at Triple H. Hi, I'm Aziz. I look for Thomson writers. I've no idea what big data. Just come to know something about it. I'm Nitish. And I'm working for Meshrabs, a start-up network. And I'm mainly a visualisation of the data if we can visualise using that for our dreams, something like that. That will really help us. Hi, I'm Harina Kupeti. I'm working for IMDb. I'm basically a big developer. I've no idea about the data. So, Siju, I'm also looking for a project in the next few months. I'm particularly interested in figuring out patterns in, let's say, time series or things like that. So, if anybody has... Yeah. Hi, this is Uttam. I work as an artist for a start-up called R2. I came here to work on visualisation part using some available tools. And I'm interested in ontology and there's in-year schools information part. Thank you. Hi, I'm Satish here. I'm just a beginner in machine learning right now. So, I've had most of the older, most of the older big data experience there. So, one of my goals is to do something on machine learning or looking for a future using some data pattern or detection tool. Hi, I'm Voldy. I actually run a big data start-up called V-Wall. It also has currently a terabyte of data in one. And we have made available one data set close to 8.5 million reviews if you guys are interested in machine learning or NLP. That would be the data set I should work on. Hi, I'm Shashi. I work for Babajab.com. We are also making available a data set of jobs and job seekers. And we want to find out what is the ideal profile which gets a job seeker selective. So, that's the data set I'll be looking on and then look some of you as well. So, what do you guys have available if anybody wants to work on a data set? Hi, I'm Harsha. Do not know anything about big data. And project which I would want to work I was interested in FDA. But, you know, there are a lot of ideas which are coming up. I'll pick up one and I'll just work. Thank you. Hi, Myself is Hari. I'm basically a PHP developer and often source lover also. So, I was looking for the Flipkart data. Let's see what I get. Who's here to Flipkart? Hi, I'm Gaurav. The data set I'll be looking on is available on the internet. It's a giga-owned competition which has, I think, millions of blogs and the users who have liked them or not liked them from WordPress. And the idea is to build a model so that you can give in a blog, you can predict which user is going to like it or not like it. So, it could either be treated as a recommendation engine or you can even build it as a classifier. Whatever. So, I'll just see what I can do with it. Hi, I'm Vinayak. I'm here to hack and learn. Not decided what data set I'll work on. Probably work with Gaurav and something. Thank you. Hello, this is Mohit from Flipkart. So, basically I work for a team which is the inventory planning. So, we do demand forecasting, inventory planning, optimization. So, we have proposed a problem of modeling the spike in the sales data due to the events. So, we don't have the million and tens. We just have a small data set having the sales data so I'm open to it either working on this problem or there's anything interesting comes up, I will take that up. Hi, I'm Siddharth. I am interested in the Twitter data set. I think there are insights you can get out of how people interact with each other and that's one problem I would like to work on. The other thing is if there's some other interesting project I'm okay to play around. Hi, I'm Pranil. I used to work for this company called Adequity which was formerly called Guruji.com and I'm planning to work on this Twitter data set because it kind of is relevant to what I'm currently doing which is like I'm actually bootstrapping on a social network app. Hi, I'm Raghavendra. Basically, I'm from Red Hat and I'm a developer of RFS. So I have not really worked on the churning of the data, whatever the big data thing has said. Since it's a kind of a cluster file system which I've worked on which provides an infrastructure to store big data and we have been here just to know what the things are and that's it out of interest. Hi, I'm Shishir again from Red Hat. Same thing applies for cluster files here to see what the pattern of data storage would be. Just look into big data as a whole. Hi, I'm Venki. I'm also from Red Hat. I've actually proposed a project. I've also worked on Plaster FS and I've proposed a project of using our Plaster FS Hadoop plugin. So looking out for data sets and all. Thanks. Hi, I'm Prabhu. I work for ABB and one of my current project is to visualize data. That's why this event. And I'm working on the data set of trying to understand what India is searching for for the last one year. So I guess that's going to be interesting. Hi, I'm Deepan. I work for a company called HashCube. I have data about games. We do social games. So I have Tableau with me that uses data and visualizes data and we try to make meaning out of that. So I have both data and visualization tool so we can play around with that. Thanks. Just to pick out people again. How many of you guys have brought data sets? So I'm guessing that for everyone else who's looking for something to work on not yet settled on a project talk to one of these guys. I've got the Twitter data set. The GigaOM likes and blogs is with Gaurav. The Flipkart data set is with... The Red Hat data set is with all three of you guys. Sorry, who are we getting from you? Venky. The games data is with... Deepan. Deepan. And what India searches for data is with me. Events that this event happened on this day this event happened on this day how those events actually affected the stage so that pattern has to be figured out so it's basically open from there. And today our dev apps architect Sashi has basically put together a whole bunch of things around the beginning of us exposing the data set. So we I think are showing off our jobs we're showing off the job seekers and in particular we're showing off the correlation of which employers like which jobs and so we think there's some interesting machine learning techniques to figure out. Can you predict that? Can you make a recommendation and say if we add in this data does this increase the likelihood that say a maid or a driver or a cook gets hired? And then we're also thinking about some data aggregation scenarios in the geographic domain that you're actually interested in.