 Hi everybody, we're back. This is Dave Vellante with Jeff Kelly. We're here at the MIT Information Quality Symposium at the Tank Center on the campus of MIT. We're here for two days in Cambridge, Massachusetts. This is theCUBE. You're watching live. In theCUBE, we go out to the events. We extract the signal from the noise. We've covered big data now for the last several years, but there's an aspect of big data that we felt we had not been covering adequately. And that's the issue of data quality and information quality. It's not a new topic, but it's certainly new for a lot of the big data themes that we hear in the industry. We talk a lot about technologies and Hadoop and Flume and Pig and Scoop and things like that, but we don't talk enough about the practical realities of implementing data architectures and specifically data quality. Derek Strauss is here. He's the chief data officer of TD Ameritrade, obviously in the financial services industry, and somebody who I'm sure cares a lot about data quality. Derek, welcome to theCUBE. Thank you. Talk a little bit about your role as CDO. Is it a relatively new role at Ameritrade? Has it been around for a while? It's pretty new at TD Ameritrade. It's just over a year old. It was created when I joined. So you brought the notion to Ameritrade as a TD Ameritrade? I was fortunate to have worked with the chief operating officer of TD Ameritrade in a couple of other lives previous to this, and I was a consultant, so I was a trusted advisor to him. And so to a large extent, I had an opportunity that I think a lot of people would love to have where you've got essentially a boss that you're gonna be working for who already understands what the end goal is and understands you and you understand him. I sought you out, so I need you to help me fix this problem. So what was the problem? So the problem is really that a lot of organizations in the financial world, and certainly TD Ameritrade is no exception to this rule, have had a big emphasis on systems and applications and things like that, but not really a focus on data and information and with the rise in requirements to really have a good focus on analytics and really strong analytics, not only for internal consumption, but also to provide insights for our customers, for our clients. There was a need to do something about data and it was clearly a need that was enterprise-wide and so that was the real driver to get someone who could focus across the enterprise, not be part of IT, but not necessarily be part of any particular part of the business. So reporting up to the chief operating officer who also has the back office and various other business functions as well as IT. So IT are my peers as well as various business functions. We've heard that actually before today, that's probably not the best idea to have the chief data officer report to IT because it really shouldn't be just an IT function. That's almost, we'd be creating yet another stovepipe. You've got to, even though IT does cut across the entire organization, when it comes to data, you want people that are going to go out and talk to the constituents that are going to be consuming that data, those are essentially your customers. So is that how you spent a better party of first 100 days? Oh, absolutely. In fact, the folks that I drew out of the IT functions into my area didn't see me for the first 100 days or more actually, because a lot of it was spent out with the business folks trying to understand where the problems are, where the pain points are in the operational systems and the turning of the enterprise as well as looking at the analytics needs. Where are the analytics issues? You know, what are the real needs in terms of moving into big data, for example, but even just looking at standard transactional data and being able to get authoritative sources and getting accurate data, getting data that's actionable and getting data that's accurate. You know, so sorry, I said that. Actionable, accurate and- Accessible. Accessible. There you go, beforehand I was telling him about the three A's and I went blank on accessibility. So yeah, those three A's were the things that emerged from the discussion. But so when you talk to these folks in your first 100 plus days, I'm sure it wasn't as simple as them saying, okay, here's the problem and bang, go fix it. Maybe they pointed to the rock and you had to go look under the rock and then find- Correct. Because they were telling you about the symptoms. Not the problem. Yeah, that's right. So you gotta dive deeper and try and figure out of the 50 or 60 major issues that were raised, what are the common themes? And looking under those rocks to find out what's the fundamental cause of all those things? And I think that's where we essentially found those three key themes. So how did you prioritize which ones you went after? I mean, you don't have unlimited resources. You got the- Right. On the one hand, you got the ones that are easy to fix and common in a bundle and then you've got the ones that are maybe gonna drive business value and then maybe the ones that are really hard at effects that are gonna drive business value, others that have a risk factor associated with them. You've got this fairly complicated matrix of decisions that you had to make. How did you simplify that and attack that problem? Well, you know, at the end of the day, it really came down to everything kind of started vectoring in onto actually a handful of data attributes. That if we felt like if we could get those right, a lot of those issues- The old 80-20 rule would start getting really positive effect from it. And I think it's a typical sort of engineering approach that when you get a lot of complexity, you very often, if you really analyze it, you can reach down and you can find just a couple of key things that if you hit them, you're really gonna drive everything a little bit further forward. And that's the approach. Just to take those, incrementally expand on that. So I wonder if we can go back to the communication you have with the business side of the house. So we've talked to some other CDOs today and it's interesting that really the first focus of a CDO really is around the business, not necessarily the data. How do you continue to continue that dialogue? Because it's certainly not, you know, your first 100 days, you're in there really intensely trying to understand the problems and the issues facing the business, but that's an ongoing conversation you've gotta have. How do you approach that? You know, we heard this morning from Dat Tran at the VA talking about you've gotta have a strategic communication strategy, meaning you've gotta talk to the priorities of the different stakeholders and the language that they understand. How do you approach that? Pretty much the same way. I listened to a presentation that that gave on that and it was amazing. It put it very nicely with the kind of strategies that we're employing as well. Essentially to, you know, to try and find a common thread across all the different priorities. So, you know, you've got the various vertical parts of the business and then you've got also the functions that support those, like the CFO, et cetera, et cetera. And it's looking for that common thing that all of those constituents are finding that it's just an impediment to being able to really make progress. And then keeping up that dialogue and saying, okay, so what are your strategic goals for the next, you know, six months, six to 12 months? And what are the things that you see as really important? And of course, in any business, every quarter, everything changes again. So you're gonna make sure that at least every quarter you get down to having really good quality dialogue with all the key business leads and make sure that, you know, last quarter when I spoke to you, this is what you said, is that the same today? So we keep on taking course corrections depending on how the business is changing. And how do you prioritize your, so well, let me back up. So we talked to some folks today, Stuart Madnick joined us and talked a little bit about CEOs can sometimes serve as a, listening to the business, they have a problem, we try to solve it, but also even more strategically saying, we're gonna proactively go out and try to find ways that data can help us improve our business. Is that the approach you take? Do you try to balance those two? And how do you do that? Yeah, absolutely, there has to be that balance. Now, one of the key ways that we're focused on that is essentially on the analytics side. And when it comes to big data, of course the main question that people ask is, is this really useful? You know, is all the effort that we're gonna put into this gonna be gonna reap us benefits at the end of the day? And you know, in some corporate cultures, it's very difficult to get budget to try something where it's not a clear, oh yeah, we're gonna make X ROI out of it. Fortunately, we have an innovative culture and we've been able to set up a lab environment where we can try out various things. We can bring various types of big data together. We've got associations with some external parties, like some of the universities, the real heavy research type folks, utilizing some of their resources, some of our resources in our lab with our data, we can start making some progress in terms of showing and demonstrating some of the real nuggets that are sitting inside of this data. And the approach is really to use that as a way to demonstrate to the rest of the organization the value so that they start exposing more and more of their analytics teams to some of the tools and capabilities that we're standing on. And so yeah, I mean, kind of almost in flaming the imagination, saying here's some of the things that we could potentially do in getting the business involved. Sounds like it's critical, not just to the, we'll help you solve your problem, but let's innovate together, co-innovation with the business. Yep. So you mentioned kind of, there's the traditional data, kind of the traditional structured data we all know of in the enterprise and then you've got this big data, new sources like social media and other things, but in a lot of organizations, even just the traditional data that's in house isn't being used to its full effect. Right. How do you go about, well, what advice would you have to other CDOs or other data professionals in terms of when is it time to try to make the most of the data you already have in house and then when is it time to maybe start looking elsewhere for bringing in new data sources and actually starting to do some of that innovation? That's a great question. A lot of people ask that because it's almost like we haven't even figured out what to do with our structured data yet before, so forget about that big data stuff, we just got to get this right. Well, if you take that approach, you'll never get into that because you'll never get this right. There's no such thing as perfection. That job will never be 100% done. Absolutely, so our approach is basically hit them both simultaneously and starts more in both places. Most organizations that I've had the opportunity to work in as a consultant because I had three decades of consulting before I joined TD Ameritrade have the same kind of scenario where there are massive opportunities in the unstructured data world because it's against the 80-20 rule. There's so much out there that people just haven't tapped into at all. Especially using some of the graphic or graph or network and analytics capabilities that a lot of the tools and appliances have got these days, you can really start uncovering some of that pretty rapidly within the course of a couple of months. You can have some really striking and compelling pictures starting to emerge that'll capture people's imagination. So Derek, you've been on the faculty of the Data Warehouse Institute, so you know a lot about the traditional data warehousing world. There's a lot of people emerging that say, oh, data warehousing, it's a dinosaur and that whole business is dead and Hadoop is the new way. So I'd love to get your perspectives on that. Obviously, I'm overstating that. But at the same time, there's disruption that's going on in that traditional business. And in many ways, the whole business intelligence data warehousing business failed to live up to some of its promises. 360 degree views of your business, that never happened. And a lot of executive CEOs that I talk to are very frustrated by, and I talk to some of my clients, they say it's like a snake swallowing a basketball. We can't keep on top of these things. And then all of a sudden this dupe thing comes along and it just makes things even harder. So it's reaching in equilibrium. Obviously, people are trying to force them together and it will happen. Some type of equilibrium will be reached. But how do you see that evolving? I see it actually as a natural evolving scenario because data warehousing, the purpose of data warehousing was really to perfect the data into the warehouse and to keep it for long historical views to look at trend analysis. Unfortunately, a lot of people thought, well, it was great for helping us do that. Now let's make it do a whole bunch of other things that it was never supposed to do. So that's where you get the problems. So now with all the big data stuff coming along, the worst thing that people could do is to try and cram all of that into the data warehouse. Stop the madness. Let's use data warehouse for what it's intended for. But then let's build alongside of it another type of data store and the phrase that I've coined around it is the data marshalling yard because we're marshalling a whole lot of data into that area and it's very different from the data warehouse. It co-exists with the warehouse. In fact, in many ways it can front end the warehouse because by using the marshalling yard and looking at big data and doing the analytics on it we're going to be able to do exploration and discovery. We're going to be able to find signals and patterns in the data and once we've done a bit more further discovery on it we'll be able to determine whether hey, this piece of data that we've found a signal in is actually something that we would like to perfect and take into the data warehouse. We haven't found anything in it yet so we're not going to put it in the warehouse we're going to keep it in this marshalling yard we'll persist it we'll keep history on it because at some point in time maybe there'll be other data that we'll have in this marshalling yard which combined with some of the stuff that we didn't immediately find useful now makes it useful and now we can take that and move that to the warehouse. So I see the two as very complementary. The warehouse continues to be the place where you've got data that needs to be in acid form in other words needs to be atomic it needs to be rock solid and the marshalling yard is more raw data but also data that has maybe had context added to it so that it becomes disambiguated so it can be sensibly used. So the marshalling yard is this filtering system you pull out the nuggets that you really want those go into the data warehouse and that's where the single version of the truth to the extent that you can get one lives. You brought up another notion here a lot of people claim that much of that new type of data, the new VO data as some people have called it that doesn't need those acid properties so what do you do with that data and that point? Do you still put that into the data warehouse? Is that an advisable practice? Do you create some new structure? I would keep that in the marshalling yard essentially. I would only put data into the warehouse that needs to be in the warehouse where there's a specific need for it because it costs money to do it. It costs money to put that data into the warehouse you have to disinfect it, you have to normalize it you have to cleanse it, you have to do all sorts of wonderful things to it the stuff that's kept in the marshalling yard is kept in its natural state and how do you see the tip of the balance of value between the marshalling yard and the data warehouse today, or historically all the values in the data warehouse there's plenty of value out there, you just couldn't get it now with the marshalling yard you're actually able to extract what we like to say signal from the noise so there's more value in the marshalling yard and in some respects the marshalling yard today with all the Sudup fever is like a tail wagging the dog do you ever see that the marshalling yard becomes the dog? No, I don't think so, I think you just get two breeds of dog and I think the marshalling yard becomes the great dame okay and the other one's the miniature poodle but they're still a pack animal that coexists absolutely so I wonder being in financial services, we've also had some folks from the healthcare sector both heavily regulated I wonder how does that impact what you're able to do from a data perspective and does that put constraints on what you'd like to do how do you as a CDO specifically deal with some of the new regulations that have come along the Dodd-Frank Act and others that are surely going to come along in the future how does that impact the types of analytics and data workloads you potentially could do and how do you make sure that you're following all these regulations and are often changing and some are ambiguous to begin with right yeah, I mean the workload in terms of being able to keep up with all of that and to stay on the right side of the law and compliance is horrendous in the financial industry and so we like most financial organizations are members of various groups to try and stay on top of it we we're working with EDM Council as well and getting most of our perspectives from networking with folks like that it is it is certainly a fact that it does take a lot of resources so there's a lot of innovative stuff that I'd love to do but have a limited amount of resources so those kinds of things tend to become the first things in the queue that we've got to address hopefully over time things will mature a little bit and we'll get to a point where we'll have a bit more of an equilibrium how about the role of the chief data officer we heard today earlier from Stuart that the earliest that the MIT Sloan school could find that official chief data officer was 2003 and how do you see well first of all I've got a question what percent of organizations have a CDO maybe you don't know broadly but I'd love your gut feel or maybe you can answer specifically to financial services it's very small in terms of actually carrying that title I think a lot of people are doing some of the role of the CDO but they don't necessarily carry the title and they don't have the yeah they haven't reached that level of maturity yet do they get single digits? less than 10% I think it is yes less than 10% and so well financial services presumably would be one that would adopt that role sooner wouldn't you say very much so the conferences the big data conferences and the CDO conferences around it's mainly the financial industry that's there so yeah I believe that's the main area obviously data science the data scientist role it reminds me I wonder if it's going to take a similar path it probably won't this is probably a horrible analogy but I'll throw it out there anyway remember the webmaster everybody wanted to be a webmaster now everybody wants to be a data scientist and that role sort of evolved most data scientists that I know are probably significantly more senior and qualified than most webmasters that I knew back you know 12-15 years ago but nonetheless how do you see that role evolving and how do you see the adoption of that title that role within organizations and should it be faster? absolutely I'd love it to be faster and I think it will I think it's I don't think it's the growth path is going to be a straight line I do think it's going to be something that's going to accelerate so it's kind of S-Curvish an old guy that's my old math terms yeah I believe that the data scientist has together with the chief data officer have a tremendous opportunity to capture the imagination of the organization the data scientists in my book are people that would either work side by side with the chief data officer or on the chief data officer's staff and would be really strong in the ability to not only mine into the data and use all the right sort of data modeling techniques but to be able to visualize it I think that's one of the key skills that we're going to be looking for is people who've got the artistic flair to be able to take this myriad of facts and present it in a way that's an indisputable message that you could pull someone off the street and say what does that mean and they'll tell you immediately what it means that's going to be the key skill so you own the data architecture right is that fair or I own so I'm really a part of the business and so what I've taken is I've got data governance I've got enterprise analytics so that's where the data scientists do I've got the data architecture side which I drew out of the IT shop and I've also got data development which I also drew out of the IT shop so those are my four pillars so my group is called enterprise data and analytics so it's both sides of the coin and those are direct reporting relationships and they've got dotted line into their adjacent roles or no dotted line so the guys that you pulled out of IT from the data architecture piece report directly into you, no dotted line interesting and how did that go organizationally was it went well great take this problem good luck imagine that conversation fortunately we've got a really our CIO our CTO and our active lead really understand the need for a concerted effort on the data side and so they've been tremendously supportive they understood that was a root problem that if you solve to what you described before if you solve that root problem you get resolved and add business value much faster I think that's probably somewhat unique but you guys are in the financial services business so you make decisions fast because it's all about the buck get it done yesterday alright good well Derek, listen thanks very much for coming on the Cube it was really a pleasure meeting you, good luck thank you appreciate it we'll be right back after this to wrap it up Dave Vellante with Jeff Kelly this is the Cube we're live here at the MIT Information Quality Symposium keep it right there for our wrap up