 Hey, Jeff Frick here with theCUBE. We're at Downtown San Francisco at the Chief Data Scientist USA Convention, and we're excited to have Jay Yanomini, the head of Data Science for Google Patents. Welcome, Jay. Thank you. So you're at Google. Everybody thinks Google data, data, data, leading the charge, but as you said, there's still some cultural changes, still some cultural challenges in the world that you play. Tell us a little bit about that. Yeah, so I'm in the legal department and I sort of oversee data and analytics for the patents team in particular. And so it's almost this dichotomy of Google, almost everything we do on our customer facing products is like at the forefront of machine learning, AI, data driven efficiency, everything's A-V tested, it's leading the charge. But then legal as an industry is sort of traditional, entrenched in sort of established processes. And so there's this a bit of a tension where I think a lot of folks in legal and in patents want to be on the forefront. They want to leverage all the technologies that the engineers down the hall are building, but doing so is really challenging. And so we have a lot of times there's this desire just to build and build and add new tools and disrupt processes, but the challenge we face is demonstrating that the new tool, the new machine learning algorithm, the new application is actually helping us in an observable, measurable way relative to the sort of entrenched status quo. And so we spend a lot of time trying to figure out, okay, how are we going to evaluate performance of our existing processes? And only then can we say whether or not it makes sense to build a machine learning application. And if we build that application, then we can start tracking how well our performance is doing relative to the baseline. But that mindset of a really data driven approach, not just to building tools, but also measuring the ROI of the tools we build is a challenge that we're working on. So let's back up a step. So your patents, from the outside looking in, pretty old time process, been around forever, file the patent, wait for the patent, go through the provisional process, blah, blah, blah. And I'm sure, I mean, I don't know how much you can tell about kind of the scale of the operation that Google's got to be a lot of filings. Yeah, so we have a big portfolio. We have a big portfolio. Tens of thousands, yeah. So where's the data science value at opportunity within what has been a process that's been trucking along for literally hundreds of years? Yeah, that's a great question. I would say two areas are especially sort of easy to convey. One is the actual prosecution. And so that's basically, you file a patent and then you have a lot of back and forth with the patent office of things that they want you to do before the patent gets issued. And there's tremendous back and forth. All the back and forths have texts, so they're suggesting textual changes. And we have to communicate with outside counsel on whether we wanna do it, whether we think it's worth the cost and the time to do it. And so when you're doing that at scale of thousands or tens of thousands of these filings. It's individual processes, right? Each one is its own little process. And every time there's opportunity as well, what's the likelihood if we make this change for $2,000, we're gonna get the patent granted? And you multiply that by a few thousand, you can think, boy, wow, if we had a better system that could semi-automate this process, we could really increase our cost savings. And we could maybe abandon assets during the prosecution process if we think it might not be worth the cost. And so these are questions that, you know. So like just like the kind of basic financial modeling types of things, is it worthwhile to go? Because I mean, on average or whatever average mean, I don't know which number you wanna pick. You know, how many kind of iterations are there typically where there's these gates where you could either continue to invest or bail? Exactly. Exactly. And these are challenges that aren't unique to Google, these are challenges that every large company that's filing patents faces, there's tons of third-party providers. It's a pretty ripe area right now. So that's the first, I think the second would be around transactions. So anytime a company is buying a portfolio, selling a portfolio, licensing a portfolio, the analysis they're trying to do is how relevant are the patents in that portfolio to our products? And sometimes it's, and most of the time it's hard to do that with the traditional approach of humans reading the individual assets. Sometimes it just, there's no time. It's just literally impossible to go eyes on for a thousand patents in a week. Sometimes it's not possible because no person knows enough about the technical aspects of all of your products to be able to sort of do the mental mapping. And so there's opportunities for machine learning and natural language processing and sort of similarity algorithms and clustering algorithms to help with that process as well. And so those are the two main easy to convey opportunities. Is it all still text-based PDF, non, kind of, I guess it's a machine readable, but I would assume is the patent process still kind of old school that way or has it started to move towards more database-centric where you guys can kind of run more sophisticated queries against easier to consume data, I guess? Yeah, so it's definitely moved aggressively towards the latter. There are still some data streams that are PDFs and so it's a bit clunky to get at the underlying text in a way that's sort of machine readable, but there's a tremendous push to machine readability and we work closely with the Google hosts a free patent analysis tool, google.patents.com. And so they have a lot of clean data and we work with those folks. So in general we have pretty good access to clean text. Okay, so on the cultural shift, I mean what do they want to, is it just pressure, say like you're doing an M&A deal and we really want to extract the value of this portfolio of patents in this company we're buying so we can really make an assessment on that feeds the value. What are the drivers for people to want to do it better? Is it just simply cost because of the scale is so huge? Yeah, cost at scale for sure. I think there's this sense that if we had more algorithmic support, if we had some statistical models, we could make better decisions on how much to pay for a portfolio or how much to value the patent portion of an M&A deal. And the challenge there is again relative to what. And so if we didn't provide any data or any machine learning, a team of lawyers could get together in a room and business folks could get together in a room and come up with a price. And a lot of those folks have the thought that well if we only had more data we could come up with a better price. But the better part is very difficult. Better compared to what? Is there a ground truth? Maybe not. So to the extent that better is subjective, it tends to be a losing proposition to prove the ROI of the machine learning application if no one can agree on how much better the final result was. So that's kind of the cultural challenge in a nutshell. Because you want to make multi, tools that can be used across not just this one particular transaction, it's one particular portfolio, but something you can leverage across. And as you say, even with Google's fast resources, you can't do everything, period. So what are some of the things that you look at as you're making your prioritization that kind of helps you drive to where you want to go either regardless of the specific opportunity or the specific ask? Yeah, so I try to focus on processes where leadership and the sort of people on the ground already have a pretty clear sense of how they measure performance. And so if there's processes where we already know, okay we do this a thousand times a month, this is the scale we use to track how well we're doing, maybe it's how expensive it is, maybe it's some quality score. If that's established then it's much easier for me to come in and say, okay here's the areas where I think we could use some machine learning, use some data and decrease cost or decrease time while holding quality constant. But if those metrics aren't already established, then the initial step isn't what machine learning applications can we build, it's let's agree on the evaluation criteria first. How are we gonna determine success? Let's establish the baseline and then we can start thinking about data science. It's great, I just love it. It's just very similar kind of problem solving methods with a completely different kind of set of problems. Super stuff. Well Jay, thanks for taking a few minutes to stop by. Yep, thank you, appreciate it. All right, this is Jay, I'm Jeff, you're watching theCUBE, thanks for watching.