 Hey, welcome back everybody. Jeff Frick here with theCUBE. We're at the Chief Data Scientist USA Summit in downtown San Francisco, and we're excited to have Jeff Bonan, the Chief Science Officer from State Street Global Exchange. You think of that right, Jeff? That's correct. Welcome. Thank you. So for those not familiar with State Street Global Exchange, give us a little background. Sure. So, State Street is what's called a custody bank. It's not well known. A lot of people outside of financial services, but they provide fund settlement services for all the different transactions globally. I work for a division called State Street Global Exchange, which was set up to take data analytics and solutions, make those available mostly to the institutional investor, asset management community. This would be pension funds, insurance companies, and then the asset managers that serve them. And then I run within SSGX or State Street Global Exchange, an entity called GX Labs, which focuses on data science and risk analytics. So we're very, very focused on this data question and how it can be leveraged to improve risk management for these big institutional portfolio managers. So financial services has always been on the cutting edge of data science, right? They've always used that as a really key strategic advantage or not even an advantage, but just kind of got to be there to play in the game. So how's the environment changing with some of these new tools and big data and Spark and some of these other new technologies you've been asking? Yeah, it's a really good question. And actually financial services is quite heterogeneous on the data science side. What I mean by that is if you go into the, say, hedge fund community or the big investment banks, they've made a lot of investment in data science and they're very forward thinking. And in fact, I've worked in some of those institutions in my past employment. But the institutional investor community that I'm talking about, the people that I work with, they've actually been kind of late to this game. And so part of it is educating them on some of these new techniques. And the great thing is that they can take advantage of these new technologies in a way that weren't available even say three or four years ago. And in many of them, you take a large national pension fund or even a fund say like CalPERS in California or Ontario Teachers in Canada or some of the Northern European national pension funds. You know, they're making big investments in some of these new technologies so that they can leverage data science and materially improve the way they manage their risk. Right, so one of the panels earlier today and you're going to be talking later is on data quality. And they were talking about the chief data scientist really needs to be a sales person. When you have the business unit, person in common, they want to solve a problem. Oftentimes, there's a data quality issue that has to be addressed long before you actually kind of get to the end, the analysis. How do you see that kind of changing and how do you address that issue with your business stakeholders? Yeah, I think that's a critical point. Right now in financial services, you still find 70, 80% of the effort and resources focused on the data preparation step. And that's what's often called ETL, extract, transform, load. And then if you add cleaning, which is a huge issue, you have most of the time spent on that and not enough time spent on the actual analysis, which is more interesting and more valuable. I think that these days, there is this realization. And so at the same time that you're getting this deluge of data, people are making investment in saying, let's do a better job of that preparation phase. And so if I use some of these new tools that might be available to manage multiple data sources, looking at overlapping data issues, sorting out noise from signal, that I think is a huge opportunity and definitely where everybody is headed. Well, how does the data quality challenge evolve as more people are looking for different types of data to do different types of analysis as you kind of democratize this analysis, you want to get it out to more analysts, you want to let them ask more questions that they didn't ask before? Does that just make the data quality issue worse? I mean, are you constantly kind of chasing the tiger by the tail? Actually, it kind of depends on who's using the final output. So if you're talking about the analysts or the quants, so these are the people that are down, as we say, down and dirty in the data. I don't think that, I mean, they constantly complain about data quality, but that doesn't really affect their perception of the usefulness of the data, and they participate with these teams that sit in the back office that try to do all that work. And I think that one of the trends that I see is the users of the data in kind of that analyst layer are getting more involved with that first step and actually also providing ideas or new tools. Now, if you move to, say, the managers are what I would call the quantitatively informed executives, but not quants. So these people, data quality is a huge issue. And I've had problems where you have great models, but data quality produced some kind of strange answers. And then you have a senior executive, doesn't quite understand everything that's going back behind the scenes. They look at a couple of anecdotal pieces of evidence that suggest that there are data quality issues. And all of a sudden, they want to trash the whole process and go back to more ad hoc, gut-based decision-making. And so that is a challenge. So I think that there's kind of two levels. You have to spend more time making sure that you're bulletproof on the data quality before you push it out to the business decision-makers. But I think that the analyst level, these people are just much more involved in improving the quality. Well, because not only are you trying to kind of democratize access to these tools and get people a bigger group of people to be more data-centric in their decision-making, but on the back end, you're also pulling more sources of data, more varied sources of data, and just more quantity of data are probably the types that you've been analyzing all the time. So you've got that challenge. So what are some of the things that you guys do to make the data quality issue smaller as an issue? Well, I think the first thing is we now have people that whose job it is to think about data quality. And within our firm, we call them data stewards. And that's actually a new role. So in the past, you kind of did data quality if you were an analyst or if you might have been the database administrator or somebody in between. So that's become, and I think if you give accountability and incentives, that changes people's focus. So that's the first thing. The second thing is we invest more in what I'll loosely call data curation. So this is where I'm sucking the data in, but I'm being more careful about curating what's actually going to be used. And we actually put a kind of a verification quality control layer before that gets pushed out to the decision makers. Because I think that's the critical piece. Having more data is not necessarily good. I mean, it can be. And I can give you a lot of examples where having more data or new data has definitely made it better. But if you're not careful about the quality that goes to the final decision maker, then I think you actually put the firm backward because that person will not give you budget to continue to push it forward. I mean, one of the things that you face in financial services, even though as you say, there are areas where people are ahead, that tends to be more quantitative fund managers, people that might be risk quants, which was actually the career I started in. We love data and we're happy with it. But there's lots of people inside financial services that still feel like it's about relationships, the data, not necessarily going to help them change their job that much. And it's our job to help them change their mind. In order to do that, I have to be very careful about quality. So before I let you go, kind of impressions of the conference for the last couple of days, surprises, affirmations. What if you have any significant takeaways as you leave this show? Yeah, I think one of the things that I've been struck about this conference in similar conferences like this is how you have different industries that are facing similar problems and we're converging on a set of ideas and tools to address those problems. And that's a bit new. So, for example, I think you mentioned it early on, things like Spark. We are experimenting with a Spark platform. And now I come to a conference like this and talk to somebody in healthcare or social media who's also using a Spark platform. That wouldn't have been the case, say, 10 or 15 years ago. Or another example is, at least in my shop, we've standardized on Python. And I find other places in other sectors that are doing something similar. And you start sharing code, sharing ideas across industries where 10, 15 years ago, I'm not sure that we had much commonality, we didn't have any commonalities at all. So that's something I think is great about this conference. And I think that's surprising. I mean, it's a bit surprising to me when I hear a presentation from a completely different industry and find so many points that resonate with the challenges that I face. Is it easier to share with somebody who's an oil and gas or retail or whatever than if you were just at a financial services conference and again, you've got more kind of peers in your same industry? But I think it's important to do both. But it's not so much that it's easier to share because I go to financial services conferences and quantitative finance conferences as well. It's just that when I talk to people in other industries, it forces me to get out of my comfort zone. Because all of a sudden, they're going to say things or look at data in a way that I probably haven't thought about. And I'll give you one specific example. We've recently been very focused these days on trying to take unstructured data. So this would be text data. It might be in forms of PDFs or HTML documents or text files and marry that with some of the more standard structured or quantitative data. And I've learned a lot from other industries that are much further along that path than we are in financial services. And I think without these types of interdisciplinary or inter-industry conferences, it's hard to get those insights. Just supports what we talk about all the time, right? University of opinion, these do better outcomes because you just don't think of things the same way somebody else thinks about them all the time. All right, Jeff, well, thanks for stopping by Jeff Bond from State Street Global Exchange. I'm Jeff Frick, you're watching theCUBE. Thanks for watching.