 The Cube presents On The Ground. Here's your host, Jeff Rick. Hi, Jeff Rick here with The Cube. We are On The Ground, a really special edition On The Ground in New York City. We usually don't do On The Ground in New York City, but we are out here for a convention. Took advantage, we saw Hillary Mason at the Grace Hopper celebration and computing and said, we've got to come back and do more with Hillary Mason. So really excited to be here in New York with Hillary Mason, Fast Forward Labs. Yeah, thanks for coming by. Absolutely, thanks for having us. So let's jump right into it. Give us a quick update on Fast Forward Labs since we last talked to you, I guess it was October. Yeah, so things are going well. We're having a lot of fun. We're still looking into machine learning capabilities that are emerging and trying to make them useful to people. And our latest project is going to be on summarization algorithms. So taking one article, getting a couple sentences that tell you what it is, or taking many articles, even thousands of articles, and getting a summary that says here a few viewpoints and this is what each one says. Right, so let's back up a step because you said a great thing at Grace Hopper about what you guys do here and it's really helping companies figure out what's possible with machine learning. And you said you have this quarterly cadence where you have a new theme every quarter. So how do you determine what the theme is gonna be? How does that map back to what your clients are doing? And how does that whole process work? Yeah, so our mission is really to help people understand what the capabilities are with data, data science, and machine learning. And we do that by looking into these emerging capabilities, making sure we know what's possible, what's out there in the market, what the applications are, and then we package it and we build prototype applications, so little products that show how it works. We build reports that actually explain in plain language. This is what deep learning is. This is how summarization works. This is what a neural network is. And then we couple that and we act as technical advisors as well. So we spend a lot of time sitting with our clients, helping them sort of think through everything from infrastructure design to algorithm design to who they need on their team to really complete their product. And hopefully in the end of this, they build something that is actually meaningful for their business. And the way we do that is we get a lot of information from our different clients about what they're interested in, what they're working on, and what kinds of data challenges they're seeing. We also read a lot of academic papers, go to conferences and try and see what's interesting. We spend time talking to artists and people who are sort of on the edge of the community. Take all of those ideas, drink a ton of coffee, write them down. And then- You could keep Chinese across the street, right? Oh yeah, the food here is amazing. But then we go through and we have a pretty rigorous process for figuring out which things are actually worth pursuing right now. And we're looking for those things that are more possible today than they were a year ago. There are a bunch of different things that make that happen. There could be a change in the economics that constrain the use of a technique. So when we wanted to first look into deep learning, we couldn't afford the GPUs. But a year later, it was completely possible and we were able to do that project. So economic constraints are one, is data available, is another. Has there been progress in the open source community and the academic community on a certain application? Or possibly has there been some innovation, someone doing something in one area that we can pick up and bring into this area that actually makes it more possible to build a real product? Just to be clear, we are talking about applied machine learning. Right, right. So real products that actually use data effectively. But it's, you have an interesting model that we talked a little bit before we went on air that you often engage not so much with a product group that's trying to do a real specific thing, but you're really brought in at a higher level, like a CTO level, with kind of a more broad application spaces, what can we do that's kind of explored at a higher level and then that will potentially roll into specific products and specific product groups. Yeah, exactly. Because one of the challenges that people are running into is they know they have data, they know there's a potential product or growth opportunity there, but that's what they know. And they don't know really how does deep learning work, how does classification work. And so what we do is bring our knowledge of the technical side of things, sit down with them and try and find in the middle of those two sets of knowledge, their knowledge of their business and their product and what we know about machine learning and data science, some ideas and opportunities and a strategy to sort of approach that opportunity. It's great. Like every time I talk to you, there's so many things that could go past now, but you just touched on another one. Clearly, you know, the kind of the Moore's Law effect on compute storage, on compute power, on networking keeps changing things, but specifically about open source and how really open source is changing the game to enable things that you couldn't do before or at a different speed than kind of before open source. Yeah, we are huge fans of open source at Fast Forward Labs and try and contribute when we can, certainly to the pieces of open source that we rely on. And it's because you now have a whole community that has come to understand a practice for a technique and they're now all invested in building that piece of software. And the open source can come out of the academic community, it might come out of the applied industry side of things, but it becomes a foundation on which you can build other things. And the canonical example here is Hadoop itself because before Hadoop, you might read the MapReduce paper and say, okay, I could hire a team of engineers and I'd probably need 20 people to build something like this in two years. Once Hadoop came out, you needed two people to install and configure it and Hadoop in the beginning was a very temperamental bit of open source. Today you can sort of go to one of the AWS Elastic MapReduce and in the command line, like I can spin up a cluster by myself, I don't need a sysadmin or someone to manage that project for me. So it's a great example of something that has truly become a commodity thanks to the open source community that we can all now build on top of. Okay, so let's talk about some specific projects. I think you said the latest one you're working on is really tech summarization. Yes. And I think you mentioned there's kind of two different versions of that. There's kind of tech summarization in a single document and then the flip side is when you have a lot of documents which say a ton of tweets, a ton of stories about the big storm that's coming or the election or whatever. So go a little deeper into those projects. What are the challenges and what are some of the kind of unanticipated discoveries that can come out of these types of things? The language is really messy. And this has been a problem that people have been looking into you for 30 years. It's not a new problem. What's new is that there's actually been some progress in using neural networks for language analysis that makes it a little more possible to do this well today than it was last year. And so when we talk about summarization, at least my mind immediately goes to, I have a New York Times article and I wanna summarize it in a shorter article and that's useful. But what's really useful is when you can start to think about I have 1,000 articles or 5,000 articles or a million articles and I wanna summarize that in a way that I can understand in a small amount of time because what's happening there is that instead of going from maybe it's 10 minutes to read an article to two minutes, now I'm going from something that was just not humanly possible to something that is. And so we've actually gained a little new superpower there. And is that like, would you describe that as sentiment analysis? Is the application of that to get the summary view is it possible to kind of dial it up or dial it down based on how much time I have or maybe the kind of degrees of, I don't know what it would be degrees of some statistical word that I'm not coming up with. It's not sentiment analysis, but it might be a little bit related and you certainly could say, I want a summary this long. I have 10 minutes on the subway. Give me something to read. Or you could even think about personalizing summaries. So maybe we know everything you've read on the topic before and we're just gonna show you the updates. Whereas maybe I've read nothing on that topic. It'll show me the whole novel length. Okay, and then what are some of the applications that now clients are using that for that you guys are working with? So just kind of a simple summary. I'm sure there's much more in-depth kind of application. This is a young project that the things we're really excited about are giving people ways to understand their text data. So we have one company where they provide insurance and every person who has a policy is written along health history, description and plain language. And so this gives them a tool for sort of understanding that in a way that they haven't before. One of our demos that we've been playing with also takes thousands of product reviews on a given product on Amazon and then can summarize that very quickly. And so we think there's a lot of opportunity there for sort of better understanding the wisdom of the crowds or some social data as well. Another project you're working on is photographs and being able to analyze photographs. I wonder if you could talk a little bit about that because now if we all take so many photos, kids take a kajillion photos. They're all uploaded on Google and I get automated little summaries, look back from a year ago. Here's your day in New York. So it's interesting now kind of the machine learning to take that, especially since there's so many of them now, and give it back in a different type of form factor. No, absolutely. And this is one of those capabilities that is actually a breakthrough in the sense that five years ago we could not write software that would take an image and tell you what's in that picture. You know, is it a baby or a cup of coffee or a street lamp? And today we can do that for a fairly trivial amount of processing time and money. Was it processing that was the problem? I mean, why couldn't you say you couldn't write the software? This is a technique where everything was a problem. So accessing a data set of the size and cleanliness necessary to train these kinds of models was a problem. Having access to the computational power, in this case GPUs, was a problem. Having good software foundations to build on. So the open source and the deep learning area has come a long way. You've probably heard about Google releasing TensorFlow, which was a big announcement. But even before that, there are a few projects like Piano and Keras, things that make it possible to sort of pick this stuff up and play with it in a day rather than having to do a lot of mathematical programming on your own. So when I talk about these factors that make something really interesting, this particular technique, which is deep learning for image object recognition, hits all of them. And we did a project on it because everybody saw that Google's doing it, Facebook's doing it. And they started to ask questions, what is this? How does it work? And what could I do with it? And so now we're starting to see people do, it's not just snapshots from your phone. But if you have any sort of image data, or you must have tons of video archive data, you can now make that searchable and processable for fairly low cost. It's pretty cool. I can go through and tag it all myself. Exactly. And then in terms of Fast Forward, and what you guys do and how you interface with your clients, just touch base a little bit on that. So you do the projects. I'll let you talk about kind of what are the outputs of an engagement for a company with Fast Forward? So we work on a subscription basis. And this is a business model that I think gives us the best of both worlds. And our subscription has two pieces. So the first piece is a report and prototype every quarter on one of these new machine learning capabilities. So the one we're working on right now is summarization. We had deep learning for image analysis. We did natural language generation. So that's article writing off of bits of structured data. And each one of these comes with a report that says what it is who's doing it, what's in the market today, conceptually how it works, ethical issues that might come up from building products with this thing, and our prediction for where we think it's going to go. The prototype uses example data to demonstrate that this thing actually works. People can look at the code. They can get a sense of how you might build your own product using this. And then we couple that with technical advising. So I like to say that we're their nerd best friends. We're the people that you can call when you have a question around, is this crazy idea possible? How would we design infrastructure for this thing? Who do I need to hire to manage this project? How do I interview data scientists when I've never hired one before? And so we end up helping with a variety of different kinds of problems. And so we couple the reports and prototypes subscription with the technical advising. And the two are very complementary. Awesome. So last question before we wrap up, I just want to get your take on the state of data science. There's a lot of conversation, especially with big data shows that we go to, about the move to the citizen data science. And I don't think it's so much that everyone's going to learn the complex algorithms and be a PhD qualified person, like a true data scientist today, but really pushing the power of database decision making downstream into people that are actually on the front lines and making either those algorithms or the output of those algorithms or derivatives of those algorithms available for people to make better decisions. What's kind of your perception on, how's that really taking hold in companies? Is this a great promise? Where are we on kind of the path to this? What's your take? So I really, I think that the basic capabilities of data analytics will be a skill that every professional will have some familiarity with no matter what their role or background is, and you'll see it in every department from marketing to tech. And that's because it's still a little bit difficult to ask even simple data analytics questions inside of a lot of companies. And I mean questions like, how many users do we have? How many new users did we get today? And these are really basic, you know, take the data and count a thing type questions. Right now it's still technically messy and that's why you have people in specialized roles doing it, but the tools are progressing so quickly. I think we'll eventually see that get pushed out to everybody. And that's great because you want people to be able to ask and get answers to these questions inside their own tool set rather than having to rely on some data analysts or data scientists. That said, I think the profession of data science is strong and we will really see it shine where we have people building data-oriented product, where people are actually crafting new algorithms to build capabilities into products that just were not possible without the data and the algorithms. And I mean things like recommendation systems, filters, a lot of personalization. There's really a lot of cool stuff happening there. But part of it too isn't just the structure of the data, where the data is, we hear all the time that the complexity of data science, a lot of it, not the actual data science, but it's finding it and organizing it and cleaning it and that's like 70% of the effort. But off camera you were talking about, you still have companies that have hundreds, hundred-year-old data- We do. That they still want to use and leverage. So will it be, I mean is the breakthrough going to be when the majority of the data's on the newer systems, it's just easier to access and do these types of algorithms or we're going to have the migrations and stuff over. We do. We're just using more new data. Yeah, this is why this is still not easy because you might have legacy systems where data lives and not just that, data tends to be siloed by a piece of a company too. So once a company gets large enough, one group might control some data and they don't want to talk to another group that controls other data so you can't do a query across those things. So we do help folks sort of navigate that quagmire of technical problems that are also tied to human relationship problems. Right, people in process are always talking about, it's not just tech, it's people in process too. Exactly, but the tools are getting so much easier to use and deploy that I think will, this will be a lot easier in a few years than it is today. Awesome, well Hillary as always great to catch up with you in front of us. And thanks for visiting us here. Yeah, absolutely, in New York City. So I'm sure we'll see you again. We'll probably see you at Grace Hopper if not sooner. Absolutely. I could go on forever, but the guys won't let me go for an hour. So great, great seeing you again, Hillary. All right, thank you so much. Absolutely, Jeff Rick here at Fast Forward Labs with Hillary Mason, you're watching theCUBE. See you next time.