 The Q presents on the ground. Here's your host, Jeff Brick. Hi, Jeff Brick here with the Q. We're on the ground in Manhattan. I guess the edge of the Lower East Side in Chinatown, really special edition of On the Ground where we came out to New York. We had a show, we said, hey, we're here. Let's stop by and see Fast Forward Lab because we see Hillary all the time. Let's see really what's going on here in the action. So we are, we're in Fast Forward Labs and we're really excited to be joined by our next guest, Misha Gorlik who does a lot of the work around here we hear. Yeah. That's sort of welcome. Great to have you. Thanks for coming. Absolutely. So what are you working on? What's getting you, getting you excited these days right in the moment? Oh, so right now, you know, all the way into summarization algorithms and looking at how we can get computers to understand text in a more robust way. So up until now, a lot of algorithms that use text would just extract out like, who was mentioned here? What places were mentioned? And it was a very tagging based method. But now with a lot of neural network techniques and all of that, we have algorithms that can understand the semantic meanings of the text. It's kind of strange though because the output of those algorithms aren't, you know, Obama was mentioned in this sentence, you know, it was in New York. It's not like that humans can't understand the output of it, but somehow we can use these to do meaningful calculations. So is there a two step process, one for the computer to kind of understand what's going on and then second to present that in a way that I can understand what the computer figured out? Yeah, that's exactly right. So the way that a lot of these algorithms work is first we find a good representation for the data that the computer can understand. That's called an embedding. And then we need to come up with a model that takes that representation and makes it useful for us. The cool thing though is that that embedding or that computer based representation can solve a lot of problems. So with the same computer representation of a sentence, we can solve the problem of are these two sentences related? And we can also solve the problem of summarization, for example. So they become very versatile and they also give you a more fundamental understanding of what is text and how can computers even understand text? Right, and then that's before you factor in a little thing called language. Exactly, go ahead and subtle to you, right? That's the problem with text. A lot of people say it's easy because I can parse it. It's encoded in a way that my computer can read, but sentences are so subtle. I can put a comma somewhere and then all of a sudden the sentences meaning is completely different. So it's actually a really hard problem. Yeah, anyone that thinks it's easy never ran the old OCR software years ago that you can even type this faster than they can read the page. Well, it's interesting how much of the brains are on the words versus the sentence structure versus things like punctuation. How does that all get kind of mapped out and where's the heavy lifting? Where's kind of the breakthrough? So the breakthrough happened a couple of years ago when people started embedding words. So there was a technology that came out called Word2Vec from some Google researchers and basically what they did is they trained a network, a model using just Wikipedia data. They didn't have to annotate it. They didn't have to say this is a name, this is an important word. All it did was look at what words are surrounding this particular word and it was able to understand the meaning of words. It was actually able to solve analogies, which was mind blowing because at the time that was a really hard problem. And so now people are starting to extend that method to other places. So a lot of people are trying them with images and this particular thing that we're looking at was using sentences. So they use a thing called recurrent neural networks which are a new hotness in the deep learning space because they're becoming more and more usable to the everyday person. And they were able to do that same sort of thing that happened on the word level but now on the sentence level. Yeah, interesting. And it kind of takes us into the next topic that we want to dive into a little bit, which is open source. And people are familiar with open source software. It's been around for a long time. But I think some of the stuff that you're talking about that's interesting and directly optical is open sourcing algorithms, open sourcing code. And then the other thing, we just talked to you the other day about open data sets as the government begins to open up all these data sets. And there's other kind of data sets out there that you can now grab, start to do your own manipulations and do calculations on, which really didn't exist before both in terms of the access, the APIs, and then of course the computing horsepower on the back end. Yeah, definitely. I think open source software and open data has fueled the growth in machine learning to an amazing extent. So nowadays it's common for academics when they come up with a new model to put their code online, to put their data online before that was unheard of. You look at the 90s and look at machine learning innovations back then. And for most hobbyists, the most that they would learn about these innovations was the paper and maybe a press release. But now you also get to play with the code. You get to play with the data. You get to take their results and start playing with it and changing it and trying to create innovations of your own. In a way, I think that also kind of kick-started a lot of the innovation that happened in industry because the pool of academics who know how to work with these algorithms is really small, but once we start democratizing it and hobbyists can start learning how to use these things, then the pool of people that can potentially bring these innovations into industry all of a sudden just expanded. And you see the applications blow up because if I'm just some guy working on a weekend project, and one of my weekend projects is I want to train a computer to learn how to play Super Smash Melee. It's a game for the GameCube. There's no industry or business mechanism that's not profitable in any way, but I'm doing it on my weekend, it's fun. And that sort of work, that algorithmic work, the data work could spark new innovations that might be useful down the stream. It's a really interesting twist too on just the position and the value placed on IP, where before it was really, it's a my IP, it's I'm protecting it, and you would talk to people that are doing startups and all their questions are about how do I protect my IP and get people to sign in DAs? Where open source is as much as it is a method, it's really an attitude about it's actually better if I let other people bang on this as well, Lord knows where it's gonna go, places I could never take it. But I still get value placed on my contribution that I get in a closed IP that nobody ever saw. I mean, that was always the killer you talked to people doing startups. Well, you and your dog can know the secrets, but if it never gets out of your garage, nobody cares. So it's a really different attitude that open source really wraps around these projects. I mean, a lot of people are starting to realize that it's not even necessarily the algorithm that is their secret sauce. They can show that to as many people and show people like, look how smart we are. Maybe you can also use this somewhere else. The thing that really makes the company the unique person who can solve that problem in the way they have is their data. So that's the thing that makes them unique and that's the thing that gives them some method of giving value to other people. Of competitive advantage, right? Yeah, exactly. So the other thing that you talked about off air was, and it's kind of intentional, is really the art aspect of this. And again, I don't think a lot of people necessarily associate art with computer science and data science and machine learning, right? I think they do have the vision of, we're talking about Terminator driving over here, of Arnold Schwarzenegger and his glass eyes. But art really plays an important part and really does enable kind of a different direction, a different path, a different discovery than the classic commercial efforts. Yeah, for sure. And I think it's also just very important for there to be more art around this. Classically, art's been the way that you emotionally connect to something. And so far, there has been ethical conversations about these new algorithms, about machine learning in general, but they're always very cut and dry. They're always focused on, well, what happens when computers become completely sentient? How will that affect people? But these algorithms are everywhere. Your phone has so many machine learning algorithms on them. You're interacting with them daily, whether or not you even know it. So we should be asking the question of how are these things affecting society? How are they affecting us? And how should that change the avenues of research that we're looking at? And I think art is a fantastic way to interact with that because for the most part, it's not something that you can take down statistics on and write down pen and paper. Data's not there, it's just hard. But opening up the issue and getting people to interact with these things on an emotional level, I think is the right way to at least open up the question. Well, but as you said, I think before we started, and even in the example of the natural language, if they can't be communicated back to me in a way that I can do something about it, what's the point? And if art in more of an emotional method versus the statistical chart or a bunch of data points and data sets and weird visualizations that I can't understand. If there's better ways to communicate back to me where I could react, understand, move forward, it seems pretty logical. Yeah, for sure. And especially since a lot of the times it's, it isn't even necessarily for a reaction or for a potential actionable thing to do. It's more just to open people's eyes so that they know to look places. They know to look, ask the question of how is this thing changing my interactions or how I'm thinking about a particular problem, for example. One thing that I find fascinating, for example, is how people defer to a lot of algorithms, right? You can have an algorithm that gives you a suggestion, do this, and you'll often throw away your better judgment and do that because the algorithm's smart. The algorithm probably knows a lot better than I do. And while in some cases that might be true, I think it's important to understand how is that algorithm smarter than me and when should I not trust it? When should I trust myself more? That's an interesting one in the example that always comes up in the news, right? It's Google Maps where somebody takes a wrong turn or they don't follow their nose and they go down some dead end which Google Maps doesn't know there's a construction project or the building's no longer there. On my transit to work, Google Maps recommends two different routes for me. I always take a third one because I also like being above ground. That's not something that Google Maps is meant to optimize for. So I take trains, it's maybe two minutes slower but I get to be above ground. So then the question is, yeah, Google Maps is giving me the best answer but how is it the best? It's not the best considering that I like being above ground. So you have to understand what problem are these algorithms solving? And is it actually my problem? Is it close enough to my problem or should I just solve my problem myself? And how many times till Google Maps suggests the route that you take every day? Yeah, or there's a button that says I like being above ground. I don't know. That's the option. Well, it's interesting because the other topic we talked a little bit about before we went on there as a citizen data scientist and you're probably not a good example because you work at fast-forward labs, you're a smart guy. But the fact that you kind of have a hobby of trying to program a machine to play a video game is pretty interesting. It's funny actually. And I think last time we had Hillary on we were talking about the little Sony dog that wasn't a hobby thing. It was Sony, but still, it's kind of a different spin, a different flavor on AI to do kind of fun things, if you will, as opposed to necessarily productive things. And the driver of that to really expand the capacity and the knowledge within people that they can take back to their jobs. Yeah, and you look even at a lot of the innovations that have happened in the past year, a lot of them have happened because of things that had no actual application. You look at Deep Dream by Google, what's the practical application of that? But it helped us understand better what's going on. And I think a lot of that non-commercial exploration is incredibly important and it expands the field, it expands their knowledge, and it'll come back to help the commercial people and also comes back to help academia. Right, right, and there is an R&D, right? It's not just all a whole D. Exactly, exactly. And R comes before the D, too. Some people forget about that, that there is value in pure research and a lot of that used to be funded by the government, a lot of that drove the early microprocessor industry and stuff, so it is, even though maybe the benefit is not directly apparent, valuable stuff. I mean, that's definitely an interesting thing. Since the government is not really funding these academic labs quite as much as they used to, where this research happens has been completely shifting the past 10 years. A lot of it's happening in corporate funded labs at academic institutions, a lot of it's happening just with citizen science and open source groups, but it definitely has changed places, so you need to look in completely different places if you wanna see where's the real research happening. Right, right. So look at down the road, you're in this every day, you've got clients that you're working on, you're working on core research, you're playing at home. What are you excited about for say, the next six months, 12 months that you're not necessarily working on now, it is on the horizon and you're looking forward to? There are so many things. Yeah, so there are a lot of things in computer decision making that I think are fascinating, so computers that learn how to make decisions based off of previous events and previous encounters with similar situations, so the common example is you have a robot, you just put it down on the ground and say, okay, walk around, figure something out and it'll walk around, it'll find food to eat which makes it feel good, it'll start feeling worse when it doesn't get food and after a while it'll learn to walk around and find food, it'll learn how to navigate potential obstacles. So it's really this behavioral learning, instead of the machine learning how to identify an image, it's learning how to act, how to interact with a system. I think those algorithms are fascinating and they're really starting to become a big focus in the machine learning world because there have been several algorithmic innovations that are pushing things in the direction to make these things possible. Yeah, well now you're scaring the people like Catherine gave comfort to in our last segment but that's okay, actually. And there is a room here I want everybody to know. Here in the office we practice what we preach here. It's not just any room, oh, it's not just any room, oh, I shouldn't have asked, we'd better be careful. All right, well, before we let you go on, obviously there's still a lot of value in people that write books. There's a lot of people that publish blogs and everything else but you've got a book here so give us a plug for the book, High Performance Python. High Performance Python, I wrote it with my colleague Ian Oswald. Basically teaches you how to make Python code super fast. So Python's a great language because you can develop quickly. You, the code is clean, almost self-documenting. You should still comment though. And so now the question is, now that I can develop and prototype quickly, how can I make this code run quickly so that I can throw this right into production and not get slowdowns that people, you know, commonly characterize Python as? Right. Anorella gave you a cool animal that actually matches the name of the book so it's easy to find on the bookshelf something like that. All right, Misha, well, thanks for taking a few minutes out of your busy day. Thanks for coming, it was great having you. Absolutely, so Jeff Rick here again on the ground at Fast Forward Labs in Manhattan, you're watching The Key. We'll catch you next time. Thanks for watching.