 Live from Cambridge, Massachusetts, it's theCUBE. At the MIT Chief Data Officer and Information Quality Symposium. With hosts, Dave Vellante and Paul Gillin. Welcome back to Cambridge, Massachusetts everybody. This is Dave Vellante and I'm with Paul Gillin. This is Silicon Angles theCUBE. theCUBE is our live mobile studio. We go out to the events. We extract the signal from the noise. We're here at the MIT Information Quality Symposium at the Tang Center in Cambridge, Massachusetts. Kind of an interesting building here. A lot of us last year had trouble finding the Tang Center. I don't know if you have ever been there, folks in our audience, but it's sort of this remote MIT building, but a lot of smart people in here talking about data, data governance, information quality, and the Chief Data Officer. David Saul is here. He's the Senior Vice President and Chief Scientist at State Street. Welcome to theCUBE, great to have you. Well, thank you for having me. So State Street, I mean, it's a brand name that's very well known around Boston, but a lot of people might not know exactly what State Street Bank does. Why don't you talk about that a little bit? Well, State Street is a financial services company and our clients are institutional investors. Some of the largest global financial institutions, pension funds, mutual funds, and their equivalent around the world. So our primary business is acting as a custodial bank. Currently we have over $28 trillion that we provide custody for, and then we also have our own investment management arm, State Street Global Advisors, which has in excess of $2 trillion under management. So talk about your role as the Chief Scientist, not the Chief Data Scientist, the Chief Scientist at State Street. That's an interesting title for somebody at a financial services institution. As far as I can tell so far, I'm the only one with that title. My job is to be a catalyst for innovation. State Street has achieved a lot of its success by the application of technology to financial services. That's, we've had a long history of that, and so when this position was created about four years ago, when I'm the initial occupant of it, it was really to continue that and also to accelerate it because we see technology continuing to move even more and more quickly. So my job is to make sure that the technologies that we're using are the best ones in order to serve our clients. Okay, so as a Chief Scientist, you're looking at all the latest trends, right? So you got the big four, cloud, mobile, social, and big data. How are those, how do those mega trends factor into what you do on a day-to-day basis? How is your organization taking advantage of those big picture trends? And then we'll dig into some of the meat of this conference. I'll save data for last since we're at a data forum, but we got involved with the cloud early on. We have had a cloud, our own private cloud in production for over two and a half years and then the work prior to that in terms of proof of concept was several years in advance. So that's a good example of why we want to get involved with technologies early on, assess them, make sure that we can apply them appropriately to the problems that our clients are trying to solve. In the social networking space, again we got started a number of years ago and we have a very successful internal social networking platform that connects directly to a lot of the things that we're trying to do on business operations side. So as we try and transform streamline business operations, make them more straight through processing, a lot of that work takes place in distributed locations. And so without the technology of being able to collaborate, including among people who may not have ever met each other before, it's a good way of hearing about best practices getting questioned and answered and so on. Of course mobile, we have both internally for access to our systems but also reaching out to our clients a few years ago. We delivered a platform, we call Springboard, which is an iPad tablet application as an alternative to our long term web browser based delivery of client reporting. But data is absolutely essential to us. I mean, we don't manufacture anything. Our business is data. We consider data as our most important asset that we act as custodian for our clients. And so conference like this one is talking about a lot of issues that are very relevant to us. So just to clarify, so cloud, of course, is all about agility and lowering costs, doing more with less and it's private cloud. Not doing anything in the public cloud? Today it's an entirely private cloud but we're certainly looking at hybrid applications but again coming back to the data, the core data, transactional data, the crown jewels as it were, it's very important that we maintain control of that both for our clients and also being in a heavily regulated industry and being a global player in that industry that we maintain those under our control. But there are certainly other things that aren't dealing with that key data that would lend themselves to a hybrid cloud environment. And as a technologist you must be obviously observing, fascinated by, impressed by the Amazons and the Googles and what they're building. And I presume you're trying to understand how to apply some of their best practices within your own organization. Is that correct and do you feel like you can replicate that, maybe not to the scale but that Amazon does? Or how do you look at what they do in terms of potentially bringing that discipline or maybe it's not discipline is the right word but ethos into your organization. And absolutely, one of the other things that I carry in my portfolio is Chief Scientist is maintaining contact with standards organization. So as a matter of fact, last month in Boston we had a full day conference that was sponsored by the Cloud Standards Customer Council on hybrid computing. And we had some of those same major players in the industry talking about the hybrid cloud. I would say a lot of it really does come back to the data. Where does the data reside? Are we able to maintain control over it? Are we able to maintain security, privacy, all of that? And certainly the external cloud providers have made a lot of progress in that area and they're able to maintain those levels of controls that they weren't able to do four or five years ago. And the trend is clearly momentum in that direction and social is a collaboration and productivity play for you guys and global is a client driven activity. Yes and really everything comes back to being client driven in the end of course. We talked a lot yesterday about culture and culture can be the enemy of success certainly when it comes to big data. You're obviously on the leading edge of a lot of trends. How do you institute a culture at the company that is receptive to these new technologies as positive about them as you were doing, Bryce? Now that's a great question. We have some overriding principles that apply to our company, not just in IT but in the business. I mean one of them is technology and the application of technology but it's certainly not all about that. Quality people, I mean we couldn't do this if we didn't have the level of quality people throughout our organization but the third one is culture and we actually feature all three of these when we talk about anything we're doing. It's never just about technology or it's never just about people. All three of these together are what allow us to be successful. And when you're deploying a new technology, I mean we hear this a lot about social networks, internal social networks for example, the reason that they fail is that people don't adopt them. A small group will use them but they don't gain traction because people are just resistant to using them. Now you say you have a successful internal social network, what did you do to get people to use it? We started out with a small proof of concept among a set of people who were already motivated to use social networking. We also went out and we looked externally to see what our staff were doing on the outside and we discovered they were already using external social networking tools. We wanted to bring that in-house both for control reasons and because we felt well if they're getting value from it then other people can get value from it as well. So we built on that first proof of concept and then we did a larger one and then we went through a vendor selection process to make sure that we had a product that fit with our existing environment where people were already sharing but using a whole variety of tools. So we didn't want to go through a major change because that comes back to the cultural point. If people were satisfied using one tool, we wanted to make sure that that tool was included in the portfolio of what we did. And then when we finally did do the production rollout, we took that core group of several hundred people and we designated our platform, State 3 Collaborate, we designated them as Collaborate Champions and we used them to provide education to be the leaders of the various communities and we leveraged the people who had already been doing it to go from 200 people to nearly 30,000 people. Now, the first year, and we also studied a lot of the research that had been done, including research here at MIT at the Center for Digital Business, and our experience followed a lot of companies. The first few months you got the people who had the pent-up demand wanted to use it at first and then it really took us nine months before we started to really see significant adoption but by the end of the first year we were already exceeding our objectives. Now, you're never going to get 100% of the organization to use it but we have two thirds of our company who use it on a daily basis, which we think is really good. The other key piece of it is you need to get all levels of the organization and we were very fortunate in that some of our senior executives jumped on it right away and use it regularly, they go online, they expose themselves in real time to questions and pretty much anything is fair game and they answer those questions. So I think the individual contributors you have the easiest problem but you also have to get senior management involved and we were able to do that. And this essentially all occurs behind your firewall, correct? Yes, this is entirely internal but we get a lot of questions from our clients who want to do the same thing within their own organization and we are planning to extend it to our clients at some point once we and they will be satisfied that it's secure and we're able to control it. So let's talk a little bit about data and financial services. When we talk to the retail banking guys they talk about building data pipelines, they talk about how sampling is dead, how they're using algorithms are free, it's what you do with the algorithms, all kinds of interesting stuff particularly around marketing, around risk and around fraud. What's different in your world? I mean some of those concepts maybe apply but how are you using data and what's your version of the data factory if you will? There are some similarities there. For us the data or at least from my view splits into two parts. One is we have a lot of data already. Data that we store for our clients, data that we receive from external sources. If we can integrate that data in new and better ways and do it more quickly and deliver that to our client under the umbrella of data analytics there are certainly business opportunities there. The other side of it is risk and compliance. Regulatory reporting has increased year after year. We see no reason why that's going to change. I think the real secret sauce is the companies that are going to be able to get synergy between data analytics and risk management rather than putting them into silos and treating them as two separate things. Lots of organizations extract data from their transactional system, their operational systems. They put it into a data warehouse and then they try and figure out how to get some value out of that. Then they take some of that same data and they put it into a risk repository which is really the same thing and that is purely on the expense side, the others on the revenue side. In the end, that is the same data. If we can map those data sources and really get at what they mean we can get double duty out of it. So there's no reason why we should completely separate the data analytics from the risk management. And I just wanna say there's a technology that's out there, it's actually been out for about 10 years now, at least in concept, of data semantics, of mapping the meaning of data and its relationship to other data. Today what we do, what everybody does with data warehouses is they look at all of the sources, they write ETL programs. So they extract, they transform the data, they load it into another place. They've already lost some context for and some meaning for that. Think about it, at least in concept, if data could travel around with its meaning, wherever it went. So if it goes to the analytics processor or goes to the risk manager, some things today that are very, very hard to do become almost trivial. Things like legacy. More and more the regulators are requiring us to report, well where did that data come from in the first place? How many stops did it make along the way? How did you transform it before it finally got to the endpoint? To do that today across multiple systems, very hard, very expensive, and even more, what makes it even worse is if one of those sources changes, you have to change everything along the whole path. With semantics, once you've mapped the data, including where it came from, that stays with it for its entire lifetime. It's a metadata approach. It's kind of metadata on steroids, yes. Which is automated at the point of creation or use. Right. And it travels with the data. And which gets into the whole control issues. So when we have to go again to audit or again the regulators, we're able to show that this hasn't been touched by anyone and changed in any way. I mean your question about fraud is very, very important. You really need to make sure that there's no tampering that takes place any way along the data lifecycle. The other part, and I mentioned semantics, to go along with that, I mean semantics would be great if we did it entirely within our organization. There's a multiplier effect if we have industry standards for semantics. Because now it means we can exchange data from to our clients and even more importantly to the regulators. Think about the problem that regulators have when they ask for information and they get it in different formats from all of us. Their data integration problem is far greater than what we have internally. So I've been very involved with efforts to standardize these semantic mappings for data in order and that gets back to my original point. That's going to help us. That's going to help the regulators. The standards organizations now have a clearer mandate. And then the fourth piece, the vendors now have a better idea of what requirements and what products they can build. Both hardware, software, as well as professional services in terms of best practices, they can now come in and say, well, for best practices, for data governance and for technology, here are what the best people in the industry are doing. And by the way, if you do it, you'll be able to exchange with other people as well. Solutions that can plug into a framework. Reusability, you know, all of these. EDI for technical services. It's a great point. I mean, manufacturing is far ahead of other industries in terms of standard. There's so many suppliers involved. Yeah. I have to ask you, I remember back in 2006, 2007, seeing a speaker talk about economic cycles and looking at the depth of depressions and bubbles over the years and saying, we've got control now because we understand our economy so much better, we understand data so much better now. We will see fewer economics, fewer deep recessions, fewer strong economic cycles up or down. And of course, the next year, everything fell apart. Yeah. Now, as someone who's in the financial services industry, do you think that this was a wake-up call? Our financial institutions getting a better handle over their data, over their assets, how their assets are being managed. Are we building resistance against a 2008 calamity again? We're making progress, but if your question to me is have we gone far enough to prevent a reoccurrence of the 2008 scenario? I don't believe we've gone far enough. I mean, I'm very encouraged by a number of things that are happening in the standard space, in the regulatory space. One of the issues in 2008 was the mortgage crisis. We're now in 2014, we still lack a standard mortgage identifier. And it's certainly being discussed at the legislative layer, that's a fundamental building block, whether it's for traditional data or semantics. The legal entity identifier, which is a result of Dodd-Frank, but is being adopted globally, which is really terrific news. The people who did that did a good job of making it not just US specific. Back in 2008, when we had companies that were in trouble, the question was being asked, well, what's everyone's total exposure to this company that might possibly fail? And we had a very difficult time of pulling, as it did everyone, pulling that information from all of our systems, because it wasn't stored in a standardized way. And we were dealing with entities that might have had thousands of subsidiaries and relationships. The legal entity identifier now says we'll be able to go in and say, what is our total exposure to this entity and all of its subsidiaries? Now that's in process of being implemented. We're in fact implementing it ourselves using semantics. But we need more of those standardized identifiers. And I heard it talk again here at MIT about two months ago from the Office of Financial Research and Treasury, very much pushing that we need to develop these standardized identifiers. The regulators need them, and we all need them. That's somewhat encouraging, but not completely. But risk management is your business. I mean, you have to protect your investors against risk. And the characteristic of these cataclysmic events is it's always something new, right? It was the SNL crisis and the mortgage crisis. And it was the dot-com bubble. And these all things came out of left field. How do you prepare, or is there any way that you can anticipate these where the next problem is going to arise? The general answer is increasing transparency throughout the entire system. So the more that this becomes transparent, not just to players in the industry and participants, but to the general public, we have a picture in our presentation this afternoon. And the center of it is trust and transparency. If you increase transparency, people are going to trust the system more because they know the information that's in there is going to be looked at by everybody in the same way and consistently. I mean, regulators do have a very, very important role in this. We all have a responsibility to give them the best information possible. And that information needs to be made available to a broad public. The benefit side of that is I've seen a number of studies that say there is investment, global investment, that is sitting on the sidelines because of a lack of trust. And there's a multiplier effect. There was a study done, I think it was the University of Chicago and London School of Economics jointly that showed even a small increase in that trust factor has a multiplier effect in investment and that helps everybody. Now, does State Street have a chief data officer? We do not. We have a head of data governance. David Blazewski. And you're talking to David later today. So he's essentially your de facto CDO, is that right? Or would you not say that? We'll ask him to. Why didn't you? Okay. But you don't have a formal CDO. No, because we are really focusing on the data governance aspect of it, separate from a lot of the other things that CDOs do. And so David doesn't have technology responsibility, for example, so he can really focus on things like data ownership, the responsibilities there, organizational kinds of questions. I don't want to take away anything he says, but we made a conscious decision that, and that's not to say we might not have that in the future, that what we really needed to do was spend more time on the data governance to correspond to the other things that we're doing on the technology side, the operation side, and so on. Well, and many folks, and I think most folks at this event would suggest that the chief data officer should not have a technology responsibility. They should be isolated and parallel roles, not certainly hierarchical. So, interesting discussion. All right, we have to leave it there, David. Great discussion. Thanks very much for coming on theCUBE. It was a pleasure meeting you. Well, it's really been my privilege to be here, so thank you. All right, keep it right there, everybody. We'll be right back. This is theCUBE, we're live from MIT Information Quality Symposium in Cambridge, Massachusetts. We'll be right back.