 Next up, Chief Scientist from Bitly, big data company. So those of you who, everybody knows Bitly, right? The URL shorted takes big, giant URLs and makes them tiny. Hi, Hilary. How are you? I'm Dave Vellante from Wikibon. Hi. I'm John Furrier with Silicon Angle. Nice to meet you. Hi. We love Bitly. We know about your product. I use it all the time. I'm glad to hear it. It's one of our contributors. Sometimes I feel like I can't live without it. One of our contributors used to work there way back in the day. Rex Dixon. Oh, yeah. This is amazing. Is he still there? He handles all of our support email. When you email support at Bitly, Rex will work alive. Yeah, so he, and we've been following you guys for a long time. I've been following the URL shorting business for quite some time, and you guys just have phenomenal growth. Yes, it's, uh... So Bitly, for the folks out there that don't know, has a service that they shorten the URLs down from these big, long URLs to short ones. So you can put them into Twitter and Facebook and essentially creates the redirection to the actual web page, which is essentially an abstraction layer between the DNS URL and the click. So with that, you get massive amounts of cool data. Absolutely. I mean, so you know the kinds of contents flowing around the web. Your adoption rate has been pretty high. So talk about the challenges that you're seeing with Bitly right now and with data. How big is it? What are some of the stats? Sure. So we use 301 redirects, which are part of the HTTP standard. That means that when you click on a Bitly link, it goes through this permanent redirect and takes you to the long URL. So we see those events where people share URLs through Bitly, and a lot of those come through our API, about 80%, which means that when you're using a Twitter client like TweetDeck, you're still using Bitly behind the scenes. And then we see all of those clicks on Bitly links. So we're able to see how people share content, how people consume content, and then how they reshare that content. And we look at the content itself as well. So we actually pull all that content and do some analysis of our own. So some stats. We're doing hundreds of millions of clicks a day on probably, again, hundreds of millions of new URLs a day, haven't checked today. And we're able to see the kinds of content people look at. Right now, obviously, the content around the conflicts in Egypt is huge. And if you saw the graph we released last night, you could see that we saw traffic in Egypt go almost to zero over the last few days, only to spring up on this amazing curve just yesterday. And it seems completely amazing to me that we can see those kinds of world events and phenomena reflected in how people are clicking on links. And what you're saying in your keynote, that interest from the outside world went through the roof when the internet shut down? Absolutely. It was that one event that caused people from around the world just to be clicking on all of these new stories about Egypt. So your growth at Bitly, it really has been driven by the whole mega trend up Twitter or Facebook because of the social side of it. And also with cloud computing, you guys can deploy your servers pretty quickly when you guys were startup and then I'll see mobility. Screen real estate is a premium. So having short URLs are key. So we were just talking yesterday and today we coined the phrase, data is the heartbeat of cloud, social and mobile. And you guys are a real living example of that. So what is the direction for Bitly? Because the market's really in your favor. The business of Bitly, you guys do all this service. How do you guys operate your business? Do you do distribution deals with people? How does Bitly's business work? So the business side of Bitly, first a small disclaimer that's not my area of specialty, I focus mainly on the data and the math. But the business side of Bitly, our current product is Bitly Pro, which is a white labeled solution for people who want to understand how their brands distribute socially on the internet. And that gives people their own short URL that's powered through Bitly and they get to see analytics about how their content that they share spreads but also how other people on the internet are sharing their content without them in the middle at all. And so people can learn some really interesting stuff from that. But we're also building some new products on our data set and I really do believe that it is our opportunity to take the data we see from other people and then return it to them in a way that'll really help them explore, discover and learn things faster. So one product that we're releasing soon is called News.me, which is an iPad app for reading the news. And we're hoping to build more of those types of things in the near future. As you get all the data, you can see trending information at any level, not just what's popular. You can see down to the micro level in sections, like a newspaper. I mean, Rupert Murdoch launched Daily.me newspaper. So are you guys going to do that kind of thing with a newspaper app? Is that what you're saying? It's not exactly a newspaper app and you'll have to wait and see exactly what it'll be. Okay, we'll go back to the data. So on the data side, you guys have a massive amount of data. What things, first of all, we're big fans of what you guys are doing, so obviously the data. I'm glad to hear it. The good thing is that it's a great user experience. You guys can explore all these gestural data and or real data clicks. So the user experience. So talk about the user experience that you guys see enabling. And then we're going to talk about the downside, which is, you know, spam. Twitter is littered with a lot of spam, as you know. How do you detect the bad guys fishing spam? Because there's a lot of that going on. And the communities are trying to self-police that. So talk about the user experience that you guys are enabling with the data. And then talk about the dark side of the data. So we see, I love the Bitly user experience. I think our product design and front-end team is amazing. And we've created a site where you can really easily share the content you want to share. And you can push it out through other networks as well. And we recently released Bitly Bundles, where you can take several pieces of content, have them at one short link, and share them together, which is pretty cool. And we're focusing on providing that kind of easy, frictionless user experience in order to help people share the data they want to share and get the data back. And one thing I should mention is that you can take any Bitly link, whether it's yours or someone else's, add a plus sign to the end of it. And you can see all of the global statistics for that link that's completely public. You can see how many people clicked on it, where in the world they came from, which websites they were referring from. So after the Bitly link, put space plus or a plus appended to the... Just a plus right after that. So no space. Bitly slash... Add a plus to the URL, and you'll see all the global statistics. Oh, slash, slash plus? No, bitly slash some letters and numbers, that's the bitly hash. And then a plus sign at the end, hit enter. That's simple, right? Very simple. Add a plus sign, boom. So you guys have a rich data set, and the data world's all about how big the data has to get some corpus to work off a data set. And so some people will have small data sets, might not see the big picture, but as you guys amass this massive amounts of user data and obviously transactional data and distribution data, you're seeing, you probably see the span, you probably see some of the patterns emerging in the dark side. So can you comment about, you know, what's going on there? I mean, that's a big problem on Twitter. We all know SPAM's out there. One gets these fishing hacks. So you guys have a good spot to go after that and look at that. Can you share? Yes, and that's a project that we work on. We spend a lot of resources on highlighting, finding the SPAM and malware in the Bitly data and preventing people from clicking through. So you might have had the experience where you click through a Bitly link. Yeah, win an iPad. I think that was going around for a long time, right? Right, so you'll get this page with a big stop sign on it and I think a little angry puffer fish and it says, you should be really careful about clicking through this. But we do show you where that URL goes so that you can make your own, we trust humans more than any sort of automated system. So you can make your own judgment as to whether you want to click through that. But the way that system works is that whenever a link comes into our system, we actually, we pull the content of that link. We analyze the traffic patterns around that link and we make sure that it doesn't resemble anything we know to be malicious content. Now, of course, there are things that are really on the line, especially shared socially. Somebody might just be a little bit too excited about a marketing campaign. And so we do have a human in the mix to make sure that we don't have. And Twitter has done that where they've actually too many files from actually a human. They've shut down your account. And we do work closely with social networks that rely on Bitly to make sure that nothing damaging. So this is a top priority for you guys. You guys are all over it. And you guys have to harness the data to kind of pull that out. Yeah, so our spam detection system has two main parts. The first part is that we partner with a lot of security companies. We use Google Safe Browsless. So anything they know to be spam or phishing will be blocked automatically. The second part is a homegrown technology. So we found that at Bitly, we see the data a little bit before anybody else about six hours. And a lot of clicks can happen in that initial six hour window. And so we take the things we understand to be malicious and use them to train a statistical classifier that makes a judgment to say, we believe there's an 85% chance that this new thing might be spam. And if the threshold is high enough, we'll just block it automatically. Yeah. So you talked about in your keynote, the state of the data union is very strong. And things are good, right? You're peeps, right? That you call yourself proclaimed nerd. You're not nerdy, by the way. You're really good in front of the camera. I'm very nerdy. Great, great. You're terrific. Look at your chance regularly. Basically, your math friends are doing well, right? The wind is at their back and just cheap infrastructure. So startups can do the thing. And so it's all good. But then you talked about, well, it's not all good. We got some challenges here. And one of them is real time. And so I wanted to talk about that a little bit and help us understand, is that a math problem? Is it a physics problem? I mean, how do we solve that? And what do you mean by sort of real time? And how do we get there? That's a great question. It is both a math problem and infrastructure problem and a problem of the applications that we want to address. So it's a product problem. So in data analysis, historically, our conceit has been that you have all of your data in a nice little package and you can look at it as many times as you want. You can iterate through it. You can try different schemes and your algorithms and see which one comes out best. And that could take hours to run or days or sometimes longer. But when you work in a real time environment, you have to be able to make those same high quality calculations immediately. That is with milliseconds of latency. And that means that we have to make some compromises, both mathematically and in the kind of infrastructure that we use. What does that do for you guys on the security side? I mean, you mentioned spam. This is a big challenge. I mean, real time has been a very big challenge. So how does that relate to the user experience? Have you given some examples around that latency specifically? So instead of using the word real time, we often use the phrase relatively recent time because real time means different things to different applications. If you're hedge fund trading in microseconds, that's a little bit different than if you're just shortening a link that you want to spread on Twitter. So our goal is to prevent things like spam from getting through the system in about 30 seconds. And we do this by having our infrastructure is set up. So every time a new content item comes in, we put it on a queue and that queue is processed as quickly as possible. And because we use a lot of cloud machines, we're able to spin up new machines quickly if we get flooded. Do you guys have relationships with some of these real time search companies? Because real time search a year and a half ago was like the hottest thing. You know, some of the names Topsy was one, Kolektika was one, one riot. There's some good work, good thesis, but that just never materialized. And so because people aren't really searching real time. Who wants to stare at a screen and see things going where you guys seem to have a better angle on the discovery side as you get data, you have more knowledge around semantic analysis between a request kind of search query, if you will, to discovery and navigation. Are you guys looking at that area at all? I think we have to change the way we think of the word search. So we have this idea that search means you go to a website, it's got a box on it. You type Aquarian and you get back a list of results. And this is an old metaphor for search. So when we think about real time search, we're trying to think about helping you discover the information you will want to know as soon as possible. And that might not take the form of something where you just type in Aquarian and get back results. If you're a logged in user and we see the kinds of content that you like to click on and like to share, we might be able to alert you. But we really, we are working on it. We have some infrastructure behind it. And we're able to use it to show things like, I had a slide in my talk yesterday showing the images coming out of Egypt in real time. But we haven't yet figured out what the product manifestation of that will be. It's still early. I mean, the search phenomenon is really a Google, if you will. It's outdated, really. I mean, I'll say that Google is outdated. It provides some value if you want to get things here and there. But the notion is to save people time. People use search because there's a lot of stuff to figure out and they want something and they want to get it fast. And or they're discovering and browsing, whatever. So in the social web, there are a lot of different ways to get that. They still see of information. How are you guys saving time for users? Have you thought about that piece of it? Obviously with Bitly, the shorter link, you get something faster, but in the aggregate, if I want to know about Egypt so much to look at, how do I know what's relevant? So I think the real opportunity we have is to take the massive amount of data that's coming through your streams already and to help you filter that. And I think that's one of the biggest open problems in the tech industry right now is not how do we get more data into the stream, but how do we take what's there and help you find the most important thing when it's important. And if there isn't anything important, help you not waste your time just reading things. That's interesting to know. I mean, you're giving essentially incentives for users to participate and allow you to collect data about them and in return, giving them services and capabilities that they can't get anywhere else. And that's sort of a real flip on the way in which we think about access to data and your personal information, isn't it? We're here with Hilary Mason from Bitly, data scientist at Bitly here inside the cube with Dave Vellante, John Furrier from SiliconANGLE.com. My final question is a little bit different, not so much about the tech, but about you personally. You're in the data business. The word data scientist is kind of being kicked around as kind of a mainstream, which is cool. I love that. But there's a lot of people who are very interested in math and science and are looking at career changes, whether they're in their 40s or coming out of MIT Stanford or whatever institution or high school, for that matter. What do you see as this profile of a data scientist? I mean, is it pure comp sigh? Is it a little bit of cognitive? Is it physicist? Is it social science? I mean, it seems to be kind of a mash-up, kind of the criteria. You got to be super smart. But what do you see that data scientist role and what would you share with folks out there who are thinking about, could I do that? If you're thinking about it and you're excited about it and you can understand the math and logic, then yes, you can do it. I see data science as a combination of math, computer science, the ability to code things that actually function, statistics, and then finally just hacking. And I think that last one is by far the most important. If you're the kind of person who can say, okay, I have some cool data, I really am curious about some questions about that data, I'm going to figure this out, then yes, you can do it. So my last question is also personal nature. I want to know what these species are that you discovered. Tell us more about that. So this was the first scientific adventure I ever had. In high school, I was really privileged to participate in a research expedition to Costa Rica. And we discovered high up in the canopy in the rainforest a kind of nematode and two bacteria that had never been identified before living in these plants shaped like this in the top of the rainforest. So given your statistical background, what are the odds of that? Actually, I believe the odds are very high. I think the rainforest is full of things we just don't understand. You guys are living in a startup. I mean, what is your take on the startup community? Obviously, at least out of the East Coast in Silicon Valley, it's got a different vibe on it. It's very robust in New York. What's your take on what's happening in the startup? I mean, why come just handing out money to people? There's a little charity going on with startups, but there's just a lot of money flowing around, a lot of creativity. What white spaces do you see out there that it might be opportunities for either a young entrepreneur or an entrepreneur to develop around data? So I think there are amazing opportunities right now. And as you mentioned, the startup community in New York has become powerful and very strong and very well connected only in the last few years. And a lot of those opportunities are in taking the systems that already exist. And we're doing a very good job now of solving the problems we had solved 20 years ago and 10 years ago quickly and efficiently. And as I said yesterday, you can now do it for 100 bucks at home in your underwear on EC2. You did say that, didn't you? I did. But we still need to figure out... That's New York. What the new capacities are that we have to solve problems that we haven't been able to address. And I think there are huge opportunities around data management, data cleaning, helping people make better decisions from their personal data, sort of quantifying things about your life and understanding it in a really easy, frictionless way. And I hope we see a lot more of that, especially in New York in the near future. Okay, we're here with Hilary Mason with Bitly, chief data scientist. Data science is a big discussion part of this event here. And entrepreneurs out there, there's a ton of opportunities. Bitly is a great example of a company that was started kind of in the web 2.0, post-web 2.0 generation with real-time web. And they're doing great and have a lot of data, and they're going to harness that data to create new products. Hilary, thanks for sharing with us. Thank you so much. Thanks for coming on. This was a lot of fun. It was great to have you. It's a good time to be a math geek and even better time to be a hacker. So thanks a lot. Look forward to having you back.