 Live from the MIT campus in Cambridge, Massachusetts, it's theCUBE, covering the 12th annual MIT Chief Data Officer and Information Quality Symposium, brought to you by SiliconANGLE Media. Welcome to theCUBE's coverage of MIT CDOIQ. I'm your host along with my co-host, Rebecca Knight along with my co-host, Peter Burris. We're joined by Tom Sastalla. He is the Director and Architecture Integration and Chief Data Officer of the US Army. Thanks so much for joining us, Tom. Thanks for having me. So tell us a little, we were talking before the cameras are rolling about your role within the US Army as CDO. Tell us a little bit about the history of that role. You are the first truly appointed CDO ever. As far as I know. As far as you know, exactly, exactly. Yeah, so the position was actually created in 2009. At the time, the DOD was working on this net-centric enterprise data strategy and it was Army's attempt to bring all the Army information systems together and create a unified kind of environment. The position, however, has gone vacant for quite some time. So I was actually hired in 2017 to be the lead architect as well as the Chief Data Officer, which is really a signal to the rest of the community that architecture and data kind of go hand in hand and having kind of a systems architecture without understanding the data flow and the data architecture was not working for us, right? And so we've been working very hard on that. Unfortunately, just to be honest, we've kind of gotten drugged down in some of the system architecture work on the tactical side. And I'm trying to elevate my guys up because, you know, we're at headquarters, right? We want to elevate them up into the things that we need to focus on from a data perspective. But since that time, both the Navy and the Air Force, Air Force has hired an actual CDO and then Navy has a lead for data integration, data interoperability that is effectively their CTO. And so we've just gotten together just two weeks or so ago as a kind of a group with OSD, the Secretary of Defense, right? And try to figure out what is the big master plan for the DOD now since they've kind of abandoned the net-centric enterprise services model from the early 2000s. And the DOD has just written and released a draft artificial intelligence strategy that makes data the center of the universe. So it's an opportune time for us to get together and actually solve some real hard problems. So one of the things that we believe pretty strongly is that the first challenge is to get people to start treating data like an asset. And I think you just explained in perhaps implied one of the primary reasons. So long as the hardware and the infrastructure, the systems are regarded as a key asset, then the institution, the organization is going to focus on that. That's going to be the primary citizen. The minute that we can treat data as an asset, then we can make data more the primary citizen instead of talking about the systems architecture, we can start talking about data. Does that comport with what you see happening at the DOD? Absolutely. So I've been saying this a lot. So we've spent a lot of time and energy in the last 20, 30 years protecting the perimeter of the enterprise, right? That's a lot of commercial enterprises have too. They have. But the reality is with the cloud and with the enterprise services that the commercial service providers are providing today, it's called the depurminization of the enterprise, right? There is no perimeter anymore. The reality is your data is everywhere, right? So we have to focus on securing the data. I said this just the other day in a meeting. The adversaries are not stealing our network. They're stealing the data on the network, right? So if the data isn't protected at the data level, not at the perimeter level or even at the server or system level or application level, right? Then we're not going to actually survive moving into the future, right? So, you know, you've heard this before, data at rest, data at transit, data encryption. Well, these things all need to be tied together and then create an ecosphere of not only identity-aware environment, right, but data-aware environment and then tying the identity to privileges or permissions, right? They give you access to what you need to do and when you need to do it. Now, we have a challenging environment in the Department of Defense. It's a little bit different than most agencies in commercial sectors as we do have to operate in the tactical environment. And so we have the break-class scenario, right? What do you do when it all breaks loose and you kind of just need to do the job? That's a situation where security gets in the way or can get in the way. And so you do need to have access to the data and it's really accessible and that kind of stuff. So, you know, having a CAC to authenticate is always hard for us, so. But it doesn't that also, so going back to the notion that you said that the focus historically in security has been on securing the perimeter. And now we have to secure the data. But that also means in many respects that we're not really focused on restricting access to the data. We're actually putting in place a regime, a new regime and a new set of principles and programs for how we share data. So are we moving from a restrict access to to appropriately share orientations? So I would like to say yes. And I'm going to give you one example and one caveat. So after 9-11, we created the information sharing strategy and we moved from the need to know to the responsibility to provide, right? After Snowden, that pendulum is starting to swing back the other way a little bit, saying, hey, we were sharing too much and now look what happened. Well, the reality is that Snowden wasn't because information sharing policies were put into place and we are encouraging people to share intelligence information to connect the dots, right? It happened because he walked in with a flash drive and plugged it into a server and had unfettered access from a permissions perspective to the infrastructure, right? So it comes down to identity, access control and controlling the data, right? And so that nexus there is really, really hard. The DOD was an exceptionally early adopter of two-factor authentication way back in the 1990s. Before it actually quite honestly worked very well, right? And so we have those scars. The federal government at large is moving to the larger ICAM infrastructure with the PIV cards, right? But what we haven't done is taken the leap from identity management into access management, right? And tying who gets access to what? Back to the identity, digital identity, physical identity, right? And then making the entire ecosphere data and applications persona aware. Because in the DOD we have a problem, I shouldn't say a problem, a use case, right? Where you have reservists that come in on the weekends. They are military government employees in the weekends. They can be contractors throughout the week, right? And there's some information that contractors are not allowed to have access to but as a reservist they would have access to, right? And so humans can compartmentalize, you know, notionally speaking, computers not so good at it, right? And so we need to be able to facilitate that multiple persona approach. And that's just useful in lots of different reasons. So when people leave jobs, right? The biggest challenge we have is to keep provisioning them out of the systems that they were in, right? So wouldn't it be wonderful, which is something we're working on in the Army, is to have a system that says, hey, rather than me sending you the form for every system you want to have access, you go to the system, you request access, we check the box, say you have access, it pushes that access back down to you, right? Based on your role in the organization, not your role in the system, right? At the time. At the time. It's very context specific. Right, exactly. And so eventually, then when you leave, we can just uncheck the box and it goes in and says, hey, no longer have access, right? But then that'll lead us into something that we were talking of probably about 10 years ago, which is risk adaptive access control, right? And so as the environment changes, let's say we have a big cyber attack, we might want to change who has access to what, right? For example, we had a denial of service attack a few years ago in the Department of Defense and the Marine Corps' response was to remove access to social media from all of the Marine Corps people because they had the ability to do so, right? And what it does is, and that was the vector that it was coming from, right? So that could help contain it, if you will. But that's kind of a heavy handed approach and just, you know, we can be a little more nuanced in my opinion. So. So I got one more question to try to pull some of these things together. It's a great conversation. So if, we like to talk about how data's in new oil and I have a slight problem with that because oil still is, follows the laws of scarcity. Whereas data does not. Data's this weird asset. So in many respects, the challenge of a CDO is to understand not only how to provide access with appropriate controls, but also to privatize data. So you are able to get a generator return out of it and do what you want with it. How, what is the relationship between securing data and privatizing data and data as an asset? Because it seems to me is that the process of securing data is pretty close to synonymous with the process of privatizing data. To a certain extent, yeah. And I did, now that you mentioned it, I totally failed to make a point in my presentation. So data is not the new oil, right? So, because oil is not terribly useful until you refine it, right? You got to turn oil into gasoline to make your cargo. And then you can only use it for one thing. Check, right, exactly. So raw data needs to be refined to make it useful and that's your point of privatizing. And so our goal or one of my goals is to commoditize the access of the information. Give it to whomever wants it and let them, because I can't refine all the data for everyone for their purpose, right? So the beauty of the internet is it's a crowd sourcing effect. So let's do that. Let's open up the data, allow you to manipulate data for your purpose and you for your purpose. But we have to secure it and say, well, you have access to the data but you can't change it or manipulate it. You can value add it. You can turn it into something else but you can't change that data. Create derivatives. Exactly. And so it's, you know, it derives reporting in the intelligence community is what it's called, right? And that's good, but we need to know where it came from at that point because you don't want to end up in a situation where the data was derived and derived and derived and derived and then someone makes the decision on a, you know, four or five or six way removed, you know, piece of information without knowing that that data was or was not high confidence, has good veracity, was from a trusted source or an authorized source and all that stuff. It comes back to the quality this year. It does, yeah. And it's, you know, it's a legal term, chain of custody, right? How do we electronically maintain chain of custody? It turns out it's an exceptionally difficult problem. And this goes, just think about Apple and the music industry, right? They eliminated DRM because it was effectively not, it didn't work properly, right? Not saying it didn't work properly, just it was so easy to defeat, right? So that's an artifact or a byproduct of a digital economy, right? When you had paper you can control who has access to the paper and before photocopiers it was nearly impossible to copy something, right? It was transcription, right? Now copying something, I mean, that's a no brainer. You know, my son, he's 14, going on 15, going on 25 actually. It is a foreign concept to him to not walk around with a flash drive in his pocket, which is just hugely mystifying to me. I actually gave him a floppy drive the other day and say, I wanna remind you where you came from. And he's like, what is this thing, right? And let me explain to you how it used to work, right? When you turned the computer on, you had to put the operating system in, right? So, and wait, and wait, and wait, and wait. And the notion of having cassette tapes is how you got your data on and off of computers. And again, he was like, what? I don't understand that, right? But anyway, so yeah, broadband, you're bigger to his access to computing capacity is just the norm these days. And us using the perimeter to pretend like that that's not the norm without worrying about the data in the applications is really, you know, that's where we all need to head collectively as a community as well as the Department of Defense. One of the points you made on the main stage this morning was that data governance is hard. You didn't sound defeatist, but you really wanted to be realistic about it. Can you talk a little bit about this problem as it relates to finding the right people to do these jobs? Yeah, so. We in the Department of Defense love to use governance as a way to control decision making, right? And unfortunately, in certain circles, that's code for not making a decision, right? And giving yourself this false sense of security that the data is secure because you didn't make a decision about it, right? Well, the reality is, you know, we always used to joke when I was working for the Navy, if your Marine proved something, they're just going to make a better Marine, right? Same thing, right? People are very innovative, right? And they will use data and systems and solutions in ways you can never conceive upfront. So what we're looking for and what we need is these talented young individuals that can help us think through some of the more odd use cases that we're not predicting. Like, you know, the, I heard this after 9-11, the reason the intelligence community didn't predict 9-11 is because they weren't thinking like Hollywood, right? They weren't being audacious in their thinking. And it's true, right? To get a good handle on data governance, you know, what did we do as an Army or we as a department of events? We stood up a bunch of governance boards and we meet and we talk about it, right? That's not really data governance, right? Because data governance is controlling access to the data on a daily basis and giving people access to the information when they need it and tracking it where it goes throughout the ecosphere. That's data governance. So there's too much of it for any human being to be involved in the process in any realistic way other than setting the policies, right? So the use of automation, the use of these, you mentioned ETL and the cleansing tools, the control vocabularies, these are all minimum criteria, minimum criteria to get to where we need to be. Then using the spirited young individuals who grew up as digital natives, right? You know, my former boss used to call himself a digital immigrant, right? It didn't exist, right? And I shouldn't say that because I'm not too much younger than him, but anyway. But using these tools effectively in ways that were not readily apparent. And then simple things like knocking down organizational barriers, right? We have organized data around organizations, not around uses or emissions, right? In the army, it's really, people don't like it when I say this, but it's really simple. There's tactical and non-tactical, right? That's really what it is. And there are business operations on the tactical side and there are business operations. There's war fighting and there's war fighting and there's intelligence and there's intelligence, right? At the end of the day, these people are being shot at and these people aren't, right? So there's an environmental thing that we need to come to grips with. Well, we in the army have created a massive wall between tactical and non-tactical because of these environmental concerns, right? Now legitimately, there is power, space, cooling, battery life, hazard, rain, sand, you name it in the tactical environment you don't face in the enterprise of the institutional army or what we call the generating force, right? But that doesn't mean that the information can't flow, and so we have a completely different set of vendors providing these solutions and these solutions. We have really done a fairly decent job in the army anyway of adopting commercial technology, the big ERPs and the Oracles and the SAPs of the world and whatnot in the enterprise, but we still have highly custom code being developed on the tactical side to do some of these same business functions like creating account management, for example, right? The two that don't meet in the middle, right? And so that's something we need to knock down is when we can use commercial software and tactical side we should. And then that'll lead in my mind to better data governance because now you have data coming down and going this way instead of data being created into two different locations and not sharing. And so hopefully we'll get there to your point. It is challenging, but I don't want to overstate the gravity of the problem and sound, being the negative Nancy in the room, right? Well, we still have the most effective fighting force in the world. And the whole notion of information asymmetry in many respects came out of economics, but it was also strategies, specifically military strategy. And it appears as though we have a pretty decent information advantage when you come right down to it. So a lot of stuff is working. Yeah, so it's interesting you mentioned that because I've been saying this a little bit at work too. The asymmetry of data and the asymmetry of the information is really what is now creating that kinetic imbalance. Beautiful, I totally agree with you. Right, absolutely. In fact that comes back to the notion of governance because at the end of the day you have an asset and you're mediating transactions against that asset and the way you define what those transactions are is what work is being performed. And so you have a lot of different use cases and a lot of different actors, but they're all performing work and it's how you turn data into a kinetic thing that creates work for the various missions and actors. Absolutely, because we call it non-kinetic targeting when you want to talk about cyber operations, because you're not necessarily blowing things up explicitly. But so little energy can be put into such a massive effect which is the exact opposite of 240 years of war fighting for the army. And so it's fishing attacks are just a great example. You send a couple thousand emails, it only takes one person to click on the link. And what's it cost to send an email? Like a quarter of a tenth of a penny, right? But you can steal gigabytes, if not terabytes of information in a matter of minutes with just one small erroneous action. And for the Department of Defense in the army, DOD is about four million people, army is about 1.2 million people. The odds are tough. The odds are stacked against us. We can't get it right every single time. And so that goes back to let them steal the data if it's encrypted, if they can't unlock it and use it. Now, you know, just me, I'm a former, I'm a recovering cryptologist, right? Quantum cryptography and quantum computing could change the dynamic a little bit. But you know, if you can use quantum computers to break the cryptography, you can use quantum computers to create new cryptography in my mind. So I haven't been in that space for about 20 years or so, I don't know where it's going, but. Well, well said. Thank you so, we could talk to you all day, Tom. Thank you so much for coming on the show. No problem, thanks for having me. I'm Rebecca Knight for Peter Burris. We will have more from theCUBE's coverage MIT CDOIQ, just after this.