 Hello, welcome back Mark Risen Hopkins here at South by Southwest 2011 and I'm here with Nick Dukoff from info chimps Where I'm from I'm pretty familiar with because I'm a Texan and I hear about these guys all the time You may or may not you should probably should know who these people are, but if you're not Nick I was gonna have you start off with a little bit of an elevator pitch talk about what your company does and Acquaint them. I hope hopefully they can hear you over the whatever that is a keynote or contest whatever's going on out there So sure, thank you info chimps is a marketplace to find share and build on data We have two big customer bases one is the developer community, which we're just really focused on making it super easy for developers to build applications You know an application is really two things right? It's code and it's a database and There's lots of folks out there that help developers get access to code such as github But there's really not a centralized repository for structured information data, and so that's what we're building and We're really excited about it The other part of our business is our marketplace where we have data sets that are published and can be downloaded as flat files So if you're you know mom-and-pop or you know non-technical user and you know data for you is you know Viewable in Microsoft Excel That's you know, that's the place for you the beautiful thing is it's all found at the same place, and that's info chimps.com So I was gonna I was gonna talk a little bit about your recent announcement Michelle the former contributor to Silicon angle if you're watching this video you probably know who Michelle Greer is has been Excitedly talking and hushed tones don't tell anybody till we announced but check to look at this is really cool Your API Explorer and the launch of is it 1000 API is 1000 2000 data sets so I I've I've never really dug as deep into your data sets as I have in the last couple of weeks while you've been turning on the API Explorer and uploading these new things. So tell me tell me first of all about The broadly about the data sets and the API Explorer and how all that works And then we'll dive deeper into a couple of these that are really cool Thanks, and you know sorry to steal Michelle from you But she's a rock star and we love her so we recently published 2000 new API calls and You know that that's pretty exciting for us We're trying to make you know as much data is available in one place as there is on the internet and these 2000 API calls range from social media data to weather data to stock data and Really, you know our our key focus here was just to try to think of what are the building blocks for an application? How can we provide just data sets that you know can inspire developers to build applications without ever having to bring data down onto their own server? The the API Explorer makes it super easy for anybody to come and see you know after they pass through an input What what the output looks like within their web browser? So they don't have to go and start coding to figure out what the output is going to look like they can you know Get a few samples right there in the browser so the And as someone who is a lightweight developer these days, but Was a heavy coder back in my earlier days The API Explorer is what really makes it real in my opinion coming because you can you look at the documentation all day long and We spoke to Somebody earlier today that's in the documentation business and you soon as you hear that you know snore right you know I don't want to Some as either you're thinking about it something has to write the documentation Which is a which is a big task always or someone's got to read it unless you need it like five minutes ago You you're not gonna be hitting the books. So but being able to just see a little box and like okay Here's what I put into this box And hit the button and see what comes out the other end. That's what makes it real So that that's I think something that makes what you guys are doing pretty exciting now, but one of the ones that Michelle showed me Was Quirley Which is another company that uses you as the platform to publish the data and the API And so talk a little bit about what Quirley does I can see a hundred uses for this For applications we're developing so talk a little bit about what that does and and in depth about as much depth as you can About how they get their data and all that so Quirley is a company run by Max Niederhofer based in London, UK and He was previously at Atlas ventures. He was a VC, you know came back to the bright side of things and Started his own company. What clearly does is a database across social identities. So, you know, who are you online? Who am I online? I'm Nick Dukoff on Twitter. I'm slash Dukoff on Facebook. I'm Slash Nick dash Dukoff on LinkedIn and you know, it's hard to sometimes find in a programmatic fashion You know all of the identities for a person online and so what Quirley's done is you can pass through whatever you've got Twitter handle or Facebook account or a LinkedIn account and it will help map across all of the other social networks and help you find your Flickr account the YouTube account, you know your LinkedIn account so that You know developers can help build, you know any number of applications Yeah, we deal we're based out of the Cloudera office our Palo Alto group is based out of Cloudera office so a lot of what we do is using Hadoop to bring structure into unstructured data and I end up that API right there. I think saved us probably about three months worth of development on one aspect So we're gonna be using it. Just just so you know, but I mean just being able to surface surface content in a way that Like being able to access, you know, you know the people that are around it like an event by South South by Southwest You can troll feeds find people that are there at South by Southwest But you don't always have access to all the content they're publishing because they may not have an auto feed going But you know with something like Quirley you can pull all their other feeds And then you know just just filter it based on location or date range or whatever it is you're doing and Really come up with something useful You know to speak a little bit about what they do and I'm happy to also introduce you to Max He he's coming into Austin for South by Southwest But I hope you get it through us and not and not them but So what Max does is you know, they use indicators You know strong links across your various profiles to see okay is at Nick Dukoff Really the same guy as Facebook slash Nick Dukoff, right, you know Am I linking to my Facebook profile from my Twitter profile or you know in my Facebook? I mentioned you know back to my Twitter profile or my about me profile or something else right so that they can see okay Well is this person really this this person? Well, and then this kind of links into the the other just the other API We were discussing earlier, which is the Twitter profile search that combined with maybe the Quirley search would be a great way of surfacing Like authority nodes on you know amongst content providers. So talk about the differences between Twitter's native Profile search we did we ran it on Batman Batman comics my thing and Versus the the the profile search that you guys have sure so we're really moving to having You know the data store of choice for us is elastic search It's an incredibly powerful tool that allows you to do Essentially boolean searches across large data files For instance the Twitter profile search is a hunt across a hundred million nodes And what we've got now is the ability to search across those hundred million users You know with the keywords that they use in their profile and that can be you know obviously name it can be How they describe themselves what they like where even they're from Twitter the way that they do it based on just a couple searches that we ran it looks like they have some kind of method of looking both at the tweets themselves as well as potentially other keywords around What you do in China in character Gotham news and all kinds of crazy stuff Nothing none of it had to do with Batman comics per se other than you know loosely associated with Batman So I guess if you're into that you know there you go But if you want an exact match this would be the way to go so So it's not all social data you've got I know there's some sports related ones in there. There's the raw Word searches it was at the British Corpus National Corpus You've got a couple other ones that escaped me at moment just a well about 2,000 of them, but so Lots of interesting data to be able to search Excuse me so Let's Let's look a little bit broader Where did you guys? Where was the inspiration for this? What was the aha moment because big data is this is the Is a focus for us editorially for the next foreseeable future whatever that ends up being Because we we we covered a couple conferences recently strata Hadoop Amazing viewership that we were just talking about the the concepts behind big data and it resonated with both our consumer-oriented audiences developers of course, but also enterprise because big data is something that affects them too and It's not just all about social and mobile and you know the fun stuff that Mashable and the tech crunch and the web Tube logs like to talk about but it's it's it's crossed over at it. So What was your aha moment that led you to pursue the path that that info chimps has because you're your positioned At a good nexus for enterprise and all the consumer facing data stores So just just talk a little bit about that journey sure So flip chrome or another one of our co-founders in CTO was pursuing his PhD in physics at UT And in the course of his research, you know spent a lot of time, you know finding and munging data The the kind of aha moment for him was it's a pain in the butt to find data online You know Google does a wonderful job of indexing, you know blobs unstructured information on webpages But they don't do a great job of indexing structured information and so flip set out to solve this problem and Asked around his his fellow PhD candidates if anybody might be interested in pursuing pursuing this This this mission and found drew Bancel who was also pursuing his PhD to come along Join the info chimps team and kind of from there, you know, we've built up to 15 chimps Trying to democratize access to structured information So talk about the process of like data sanitization I know it's a mix of of automated and hand hand washing of the data. So if you can Talk about that it may be part of your secret sauce, but if you can talk a little about that process I'd like to learn more sure so one of our kind of core philosophies is we take data and we publish it in a structured format We don't necessarily cleanse it when there's clearly articulated demand for a very high quality data set either We'll find it either through a third-party supplier or we'll build it ourselves But unless there's clearly articulated demand we publish it the same way that we find it The only change that we make is we identify columns and rows so that you can make you know in a machine readable format Okay, but and also part of the role is is documentation of that which is which is your next big But you can only do with 15 people do too so much at one time so you've got all the data published and Part of that role is actually making it searchable curated and findable. Yeah So we absolutely want to continue to work on cleaning up the metadata, you know around the data One of the things that we've been working on is a unified format of metadata And so that's something that we're pretty far along on and really excited about and I think it will really help With scalability because you know our data team can ingest data, you know pretty quickly at this point You know we're pulling in you know hundreds of gigabytes a week or more probably closer to terabytes a week and But you know we got to make sure that we keep up with respect to you know Documentation like you were saying and making it easily findable or we end up in the same place that we were before we started input chimps, right? And so what we've done is we've loaded all of the metadata into elastic search as well as some of the data So that you know we obviously our search algorithm is part of our special sauce But we try to make you know the data set that's most relevant to you adjacent To the data that you either have or otherwise we're looking for So search search is really becoming Everything old is new again that's like a one of the themes is people going back to search and reapplying it to problems that Google, you know doesn't need to to work on right Google is everybody thinks Google has solved search and I think they'll probably be the first to tell you that we got 95% of it down But I think it may be more than that really because there's so many different aspects of search That haven't been tackled. I mean you've got the semantic side. You've got Different different organizations that are trying to patch holes in Micro site search, you know or or white listed Topic specific search and you're you working on a couple different approaches to structured data search so That's that's one of the things I'm seeing is emergent theme what just Stepping back. I mean you've been I guess it's been like a day and a half here in South by Southwest, but you've probably been exposed to The prep a little bit longer than I have being local to Austin. What's what are some of the themes you're seeing emerge out of the conference here? so, you know It's all about location, right? You know, you know Location local and you know the data that powers that and so with respect to location, you know one of the important themes is You know places where am I standing right now? And there's a number of folks out there that you know might even tell you different things about where you're standing and So over the next couple months, we're pretty excited to announce some partnerships that you know will save for another story To really make it easy for developers to build location-based applications and obviously a big part of that will be You know retail inventory and and other things about where you are right happy hour specials You know all the other ratings and reviews, you know all the kinds of stuff that folks ask for all the time You know, can you scrape city surge? Can you scrape yelp and you know? We won't necessarily but we'll work with a lot of folks who have similar databases or those companies themselves to make it available to our developer community so one of the yeah, so that's a good positioning to delve into a little bit because I think that the fear is with Companies that sit in a position you do Where you envelop so much of of an ecosystem is that you will compete with that ecosystem eventually we see it with Twitter We see it with Facebook And you know those evangelists for those those organizations will tell you okay. We're not really competing, but we know they are I mean either they are or they're just really bad at communicating how they don't want to communicate compete with their own ecosystem so that you Leave the data sanitization scraping and otherwise organizing to other people and you're just Organizing the organization of the data that that's a that's an interesting point to elaborate on You know for instance a good number of those 2000 data sets where we took factual corpus of data sets and Published them as API's right, so we took what was you know structured data and made it Published in an application programming interface, right? And that was something that hadn't been done before and now it's even easier to build on top of those databases, right? So, you know they existed in the wild and we just made them easier to find and easier to access and that's really what we're what we're trying to do Very cool stuff big data a theme search a theme South by Southwest 2011. I'm Mark Rosen Hopkins We've been chatting with info chimps, so Accompanied to watch keep an eye on these guys play with the API Explorer I can't I am I'm not being paid by these guys to say this. I just really like it. I played with it And I really like it, so I think you should too Stay tuned to silicon angle console can angle TV will have more coverage coming out of the conference So don't go away