 Okay, so it's just past midnight and we're just gonna go walk around and see what people are doing on the top floor So just head outside We do a demo because they were quite interested. They were very interested in doing a demo to shop Data they have gathered for Wikipedia. So maybe just like to say a few words about it So what we've done is we've We built the entire graph around this guy a name and hi, I'm Murthy and this is Vikrant. We are from that of you So we've been trying to Mind some interesting relationships from from the Wikipedia entity relationship graph. So we built this demo of such So let's say Christian Bale, Gary Oldman and Christopher Nolan are the terms I want that I gave They have an interesting set of related entities that I have So they are sorted with respect to relevance Most element is dark night you have Michael K. Who acted in the dark night Batman begins Natalie Portman is because Christian Bale and Natalie Portman were nominated for Academy Awards in the same year Okay And then you have Heath Ledger your Morgan Freeman and You know you if you look at all the terms that have that have returned here It's got to do only with the movie universe. So we've not moved out of the movie Well, I'm just teaching myself MongoDB. Yeah, oh cool, and my name is Amit And hopefully I'll after teaching myself MongoDB. I'll figure out how to scrape once I've done that I'm Gonna get some content by that tomorrow should be over But if I managed to do that early by early morning tomorrow, then I'm gonna actually mine that data do something interesting with it So how's the MongoDB come along? Oh, it's fairly simple as of now Let's see once I start coding. I'm just reading it as of now I joined late, right? I don't have to dinner So doesn't qualify. Okay. Hey, thanks. Thank you Next table So this is I guess the first seriously multi-language project that we have here As far as I've found I've got Java, JavaScript, and Python happening here So, yeah, can you guys go ahead and we talk a bit about what you guys are doing So we actually don't know if we'll be able to show something but here's the idea we have about one and a half million blog posts which are rated by Thousands of user we only have lights no dislikes and the idea is To build a model so that you can predict given an unseen blog Which other user are going to like it and to what degree so that is one thing but there are other options We're trying so we're just using some techniques to extract out topics out of the blogs and Then trend Or rather plot the trends of these topics across time what happened in the WordPress and if we are able to do that Then we could also correlate it with the external events that happened in the world and see whether some of them affected What were the topics in the WordPress blogger here at that time? What sort of tools are you using for your analysis, right? So you see there are a lot of things happening here So we are using just Java and some Python coding to clean up the data There's a lot and a lot of garbage like you wouldn't believe we found after doing all the cleanups We found like five million unique words in this so I mean I thought English dictionary Two thousand words. We have five million unique terms. So there's a lot of garbage here So but apart from that once you have the data then we are using ready-made of the shelf tool called Wow Pal Wabbit. It's developed by John Langford from Yahoo Research and This tool basically does LDA, which is latent Dirichlet allocation which will Which generally tries to model the data based on some generative models and tries to detect the topics and the words in the topic So this is that going on in the background This is off the shelf tool. It's too less a time to build something like right from the scratch If this completes and it gives a meaningful output, then we can mash up that output in many ways And Vinayak has done some good funky thing for coding or plotting it on Java script using stream Graph if this gives you a good output, then we will be able to show something otherwise Good Okay, next Hey guys We're going to talk about what you guys are working on Say a bit about this stuff We're continuing to work on the sentiment analysis of freaks Right now we've got a broken classifier and We've got somebody collecting He was using third-party APA basically we need to train our classifier with some training data So what we are doing is we're using third-party APIs and get fetching data as in sending in text to that API and getting it whether it's a positive or negative sentiment and then that we're feeding into our classifier But it's not really working right now As in for both, I am happy and I am not happy it gives a positive sentiment Okay, that's So that's just that one small problem that we need to get to Thanks guys So now we've got the flipkart data Hi, I just need to talk about what you're attempting to do sort of things you're looking at in terms of the data that flipkart Yeah, I'm trying to figure out the data and how to Make use of this particular data in real life Okay Have you seen anything interesting yet in the data that you found? Yeah, it's like some particular products go from sooner to certain range different series you can see here Particular day I mean some events are happening And you want to figure out in particular on Whether it's due to this particular event. Okay, this particular rising sale is happening. Okay. Okay. It's like a comparison Maybe this product is related to another product. Okay, cool. But I am not sure which product is this You don't have the names. No, it doesn't have any names There's something some even is happening Some product is So, uh, we've finished with the first one and we move down to the ground floor and see what uh people are up to Come down First we just talk about some stuff that's happening with baba jobs data. So maybe you'd like to fill us in what you're doing and What do you want to do? So we've provided the api, uh, which is providing the data claim and what we're trying to do is basically analyze the job data and you know Every job has an associated set of job seekers who are shortlisted Contacted or job seekers who apply. So we're just analyzing the whole job and job seekers data and trying to find out You know what attributes on on the job need to move job seekers applying for that particular job Or what attributes on a particular job seekers profile you to them being shortlisted or contacted by the employer So, I mean that's what He's using some python live machine learning libraries to You know Opened up some apis. Yeah. Yeah. So for this purpose. Yeah, I mean Just for hack night. We built an api. So if you go to hacknight.babajob.com So the apis are available It's a work in progress api. I mean like initially we had just provided just two sets of data But then he said that we need a third set. So I built that also So maybe just like you talk about the technology you're using for the project, right? So it's basically on python Uh, and we base library is something called scikit. Uh, you go to scikit site Which is based on numpy and Scipy, so it's a you know, very simple machine learning apis So you don't even need to know machine learning. We just need to know how to use apis and it'll do the magic of course Okay, thanks So we will talk about your processing project. Yeah I'm trying to visualize the marks of Students like we have passed in banglor like whether they have passed or failed like I want to build up a story around like Passes good and fate is bad. I want to kind of visualize something like Kind of a thought bubble or a kernel saying that okay Like it's kind of an opportunity or some sort of it's very rich of stuff like I want to build up a story and then Make it like kind of grow the bubbles and then it kind of waivers and stuff So it's more kind of artistic Project than really getting trying to get some sense out of the data some of passing counts Like it's ever increasing thing, but it should eventually stop and then it should start moving I'm trying to fix where I'm going wrong. Okay. So it's kind of a very short demo, but still This is what I can have Okay, that's pretty cool. Yeah Like this only only seriously visualization oriented Big data work we've seen so far. It's pretty cool. Yeah, so I get a small to the The twitter I forget the word actually the twitter Actually, we are taking a stream from twitter right now It's a recently available stream and Using the stream we are trying to take a run few analysis. Let's say users um Otherwise even the company they can actually see the trend of their users What's the real time print of the users and accordingly act on their campaigns or They can also go with some kind of sentiment analysis and stuff So we are trying to develop a tool over this so that um, but now we are going with twitter means like a stream data How can that be analyzed? Specific reasons in a in a real time It's like you can go with a specific reason like let's say a particular restaurant want to go on to Want to go with the tag to target some kind of audience So, uh, how many kinds of audience are there and in bangl what's happening? And can we target this particular audience at this particular of Time and are they on mobiles right now? Usually they use mobiles or not and these kind of data everything will be available So it's like with the data which we are getting out. We have multiple dimensions So, uh, we give this tool in the such a way that the Right, otherwise the guy who is using this tool will be able to configure the dimensions and accordingly filter out the data So that he can act upon So this is going to be uh, it's going to have a fairly sophisticated web interface And what are you using the back end for this in the back end? We are not decided yet, but uh We are using Okay, and uh, we uh, we take all the data out Any database let's say for now we don't know it's like as it is yes, and we might use mongo. Okay, fine Okay, thanks So let's move on to the uh The last group on the floor Hi, uh, so what are you guys, uh, working on? Oh We are basically trying to come up with uh hadoop cluster uh two machine cluster and uh, we are facing We were facing some issues We are trying to hopefully Go on that now. So the data we are looking at is this review of data Restaurants hotels and other places Crawled from social media by this company called be a wall So they gave us a one peak compressed actually three and a half week uncompressed data It's line-oriented reports. So we are planning to do aggregations on that We are doing like a word count is something we have done. That's a basic hadoop thing We'll do more like one thing he suggested is you can look for words called better And then you can quickly say It always means something is better than something is what they've written So we plan to look for something like that and say, okay What are the two things they're comparing how many records have that we have about nine million records over there So we come with more metrics for right now. We're setting up a cluster. We are mostly done with that Okay, great. Thanks