 From Cambridge, Massachusetts, it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. Welcome back to MIT, everybody. This is theCUBE, the leader in live tech coverage. My name is Dave Vellante. I'm here with my co-host, Paul Gillan. This is day two coverage of the MIT CDOI Q Conference. A lot of acronyms, it stands for MIT, of course, the great institution, but Chief Data Officer Information Quality Event, this is 13th annual event. Lars Tumray is here. He's the managing partner of Brass Rat Capital. Cool name, Lars, welcome to theCUBE. Great to have you on. Thank you very much, glad to have you. I've got to start with a name. Brass Rat Capital, what is that? Brass Rat is a reference to the MIT School of Management. Okay, thank you. It's supposed to be a beaver. Well, it is, but the students call it a brass rat. And I'm a third generation MIT, so it's just seeing a bit, absolutely appropriate that it's a brass rat. And then capital is not a reference to money, but it's actually a reference to the intellectual capital. If you have five or six brass rats in the same company, you know, we sometimes, the engineers arrive and they can do some things. And then, boy, if you're putting some data, data capital in there, you can really explode. Sometimes we cause a few problems. So we're going to talk about some new regulations that are coming down, new legislation that's coming down that you exposed me to yesterday, which is going to have downstream implications. I mean, if you get ahead of this stuff and understand it, you can really, first of all, prepare, make sure you're in compliance, but then potentially take advantage for your business. So explain to us this notion of open government. Okay. In the last five years, six years or so, there's been an effort going on to increase the transparency across all levels of government, okay? State, local, and federal government. The first of the federal government laws was called the, excuse me, Open Data Act of 2014. And that was an act, they was enacted unanimously by Congress and signed by Obama. They was taking the departments of the various agencies of the United States government and trying to roll up all the expenses into one kind of expense. This is where we spent our money and who got the money and doing that. That's what they were trying to do. It's big picture type of thing, right? Yeah, big picture type thing, but unfortunately it didn't work, okay? Because they forgot to include this odd word called ontologies. So the same departments meant the same thing. They have a data problem, data quality problem. They have a real big data problem, they still have it. So there are two GAO reports out criticizing how it was done and the government's going to try and correct it. Then in earlier this year, there was another Open Government Data Act which said it was signed by Trump now this time. It had like maybe 25 negative votes, but essentially otherwise passed Congress completely. And it was called the Open and it's all capital O-P-E-N Government Data Act, okay? And that's not been implemented yet, but there's a lot of talk around this conference today and various chief data officers are talking about this requirement that every single non-intelligence defense, you know, vital protection of the people type stuff, all the like interior, treasury, transportation, those type of systems. If you produce a report these days which is machine, I mean human readable, you must now in two years or three years, I forget the exact implementation date, have it also be machine readable. Now some people think machine readable means like PDF formats, but no in fact what the government did is it said it must be machine readable. So you must be able to get into the reports and you must be able to extract out the information and attach it to the Tree of Knowledge, okay? So we're all of a sudden having context like there are currently machine readable quote unquote SEC reports, but you can get into those SEC reports and you pull out the net income information and it says it's net income, but you don't know what it attaches to on the Tree of Knowledge. So we are helping the government in some sense enable machine readable type reporting that we can do machine to machine without people being involved. What do you say the Tree of Knowledge? You're talking about the context. It's a semantic Tree of Knowledge. So we all come from one concept like the human is an example of a living thing, a living beast, a living beast is an example of a living thing. So it all sort of goes back and there's sort of as you get farther and farther out in the tree, there's more distance or semantic distance, but you can attach it back to concepts. So you can attach context to the various data objects. Is this essentially metadata? That's what people call it, but if I were to go over to CSAIL here at MIT, they would turn around and they call it the Tree of Knowledge or semantic data, okay? It's referred to as semantic data so you are passing not only the data itself, but the context that goes along with the data. Okay, so how does this relate to the Financial Transparency Act? Well, the Financial Transparency Act was introduced by Representative Issa, who's a Republican out of California. He used to run the Governoral Affairs Committee in the House. He retired from Congress this past November, but in 2017 he introduced what's called, referred to as HR 1530. And the 1530 is going to dramatically change the way financial regulators work in the United States. It was about to be introduced two weeks ago when the Libra Digital Currency stuff came up, so it's been delayed a little bit because they're trying to add some of the digital currency legislation to that law. Trying to hit the front run that, okay. Well, I don't know exactly what they, remember it's all coming out of Maxine Waters Committee. So the staff is working on a bunch of different things at once. But OMG was asked to consult with them on looking at the 1530 Act and saying, how would we improve, quote unquote, given our technical, not doing policy, we're just talking about the technical aspects of the Act, how would we want to see it improved? So one of the things we have advised is that for the first time in the United States codes history, they're going to include an interesting term called ontology. You know what ontology is? Well, everyone gets scared by that word. And when I run into people, they say, are you a doctor? I said, no, no, no, I'm just a data guy. But an ontology is like a taxonomy. But it has order, it has importance. And an ontology allows you to do what is a, a kind of, you know, giving some context of linking something to something else. And so you're able to give more information with an ontology than you're able to with a taxonomy. Okay, so it's like a taxonomy on steroids. Yes, exactly. More flexible taxonomy. Yes, but it's critically important for artificial intelligence machine learning, because if I can give them an ontology of sort of how it goes up and down the semantics, I can turn around do AI and machine learning problems on the order of 100,000, even 10,000 times faster. And it has context. It has context. And just having a little bit of context speeds up these problems so dramatically so. And it, is that what enables the machine to machine? No, no, the machine to machine is coming in with something called SBRM, which is standard business report model. It's a OMG specification of a way of allowing the computers or machines, as we call them these days, to get into a standard business report. Okay, so let's say you're a drug company. You have to certify drugs you manufactured in India against the United States safely, okay? You have various reporting requirements along the way. You got to give X, Y, Z to FDA, et cetera. That will always be in a standard format. The SEC has a different format. FERC has a different format, okay? So what SBRM does, it allows them to describe in an ontology what's in the report. And then it also allows one to attach an ontology to the cells in the report. So if you like had an SEC 10Q 10K report, you can attach a U.S. GAP taxonomy or ontology to it. And it can say, okay, net income, I know that's part of the income statement. You should never see that in a balance sheet type item. You know, it's an example, okay? Or you can, for the first time, by having that context, you can say, or solve a problem which has existed, that you can file these machine-readable reports that are wrong. So believe it or not, there have been about 50 cases in the last 10 years where SEC reports have been filed, where the assets don't equal total liabilities plus cherish equity, you know? Just, they didn't add up. So this two entry accounting doesn't work. Okay. So you can have the machines go at scale and say, hey, we got a problem here. We got a problem here and you don't have to get the humans involved. So we're going to, Holland and Australia are two of the leaders ahead of the United States in this area. They've seen dramatic pickups. I mean, Holland's reporting something on the order of 90% pickup. Australia's reporting 60% pickup. When you say pickup, you're talking about pickup of errors? No, efficiency. Productivity. Productivity, okay. Because you're taking people out of the whole cycle. It's dramatic. Yeah, okay. Now, what's the OMG's role in all this? Explain the OMG. Object Management Group. I'm not speaking on behalf of them. It's a membership run organization. You're a member. I am a member. You're a co-lead of it, but I don't represent OMG. It's the membership has to collectively vote that this is what we think, okay? So I can't speak on them. I have a pretty significant role with them. I run, on behalf of OMG, something called the Federated Enterprise Risk Management Group. That's the group which is focusing on risk management for large entities like the federal government's Veterans Affairs or Department of Defense. Upstairs, I think, talking right now is the Chief Data Officer for Transportation. Okay, that's a large organization which they're instructed by OMB at the Chief Financial Officer level. The number one thing to do for the government is to get an effective enterprise risk management model going in the government agencies. And so they come to OMG just like NIST or just like DARPA does from the defense or intelligence side saying we need to have standards in this area. So not only can we talk to you effectively, but we can talk with our industry partners effectively on space programs or on retail, on medical programs, on finance programs. And so at OMG there are two significant financial programs or standards that exist. One's called FIGI for Financial Instrument Global Identifier which is a way of identifying a swap. It's a way of identifying a security. It does not have to be used for a QCIP, but worldwide you can identify that IBM stock did trade in Tokyo, so it's a different identifier. It has different deliverables against the one trading in New York. So those are called FIGI identifiers. Then there are attributes associated with that security or that beast that's being identified which is generally comes out of FIBO which is the Financial Industry Business Ontology. So it says for a corporate bond it has coupon maturity, semi-annual payment bullets as an example. So that gives you all the information that you would need to go through to the calculation assuming you could have a calculation routine to do it. Then you need to then turn around and set up what I'll call your environment. Where forward yield curves are with mortgage securities or any puttable callable bond and you're gonna sort of probabilistically run the numbers many times and you come up with effective duration. And then you do your various analytics. They're aggregating the portfolio and looking at shortfalls versus your funding or however you're doing risk management. And then finally you do reporting which is where the standardized business reporting model comes in. So those are kind of the five parts of doing a full enterprise risk model analytics. So what does this mean, but first of all, who does this impact and what does it mean for organizations? Well, it's going to change the world for basically everyone because it's like doing a equivalent of a software upgrade from version one to version 2.0 and you know how software upgrades everyone hates and it hurts because everyone's going to have to now start using the same standard ontology and of course that standard ontology no one completely agrees with. The regulators haven't agreed to it and the ultimate controlling authority in this thing is going to be FSOC which is the Dodd-Frank mandated response to not ever having another tarp. So the secretary of treasury heads it. It's a, I forget, it's a federal systemic oversight committee or something like that and all eight regulators report into it and OFR stands as being the advisor to FSOC for all the analytics. What these laws are doing are they're giving OFR more and more power to turn around and look at how are we going to find data across the industry so we can come up with consistent analytics and we can therefore hopefully take one day like Goldman Sachs's prepayment model on mortgages and apply it to the city bank portfolio. So we can look at consistency of analytics as well. Will this only apply to regulated businesses? It's going to apply to regulated financial businesses. Okay, so it's going to capture all your mutual funds. It's going to capture all your investment advisors. It's going to capture most of your insurance companies through the medical era side. It's going to capture all your commercial banks. It's going to capture most of your community banks. Okay, not all of them because some of them are so small they're not regulated on a federal basis. The one regulator which is being skipped at this point is the National Association of Insurance Commissioners but they're apparently coming along as well independent of the federal legislation. Remember they're regulated on the state level not regulated on the federal level but they've kind of realized where the ball's going. Will this make life better or simply more complex? It's going to make life horrible at first but we're going to take out incredible efficiency gains probably after the first time you get it done, okay? Is it going to be the problem of getting it done to everyone agreeing we're using the same definitions for the same data? And who gets the efficiency gains? The regulators, the companies, or both? All, everyone. Can you imagine that a Goldman Sachs earnings report comes out and you're an analyst looking at, how do I know if Goldman did good or bad? You have your own equity model? You just give the model to the semantic worksheet and it'll turn around and say, oh, these numbers were all good, this is what we expected, did, did, did, did, did, and you haven't, you can do that. There are examples of companies here in the United States where they used to have competitive analysis, okay? They would be taking somewhere on the order of 600 to 700 man hours to do the competitive analysis. By having it available electronically, they cut those 600 hours down to five minutes to do a competitive analysis, okay? That's an example of the type of productivity. You're going to see both on the investment side when you're doing analysis, but also on the regulatory side. Can you now imagine, you get a regulatory report and say, oh, they're out of, they're way out of whack. I can tell you this frog going on here because their numbers are too much in XYZ. You know, you have to fudge the numbers to. And so the securities analysts can spend more of his or her time looking forward, doing forecasts. Exactly. To have an analysis and having to look back and reconcile all these. Right, and you know, we hear this, this conference, for instance, something like 80 to 85% of the time of analysts is spend getting the data ready. Yeah, you hear the same thing with data scientists. Right, and so it's the sense that we can help define the data we're going to speed things up dramatically. But then what's really interesting to me being an MIT engineer is that we have great possibilities in AI. I mean, really great possibilities. Right now, most of the A miles are pattern matching. Like, you know, this idea of using facial technology, that's just really doing patterns. You can do wonderful, predictive analytics of AI, but we just need to give a lot of the AI models the context so they can run more quickly. Okay, so we're going to see a world which is going to sound funny, but we're going to see a world we talked about semantic analytics. Okay, and semantic analytics means I'm giving all the inputs for the analysis with context to each one of the variables. And what comes out of it will be a variable or results, which will also have semantics with it. So one, in the future, not too distant future, we're in some of the national labs where we're doing it, you're doing pipelines of one model, goes to the next model, goes to the next model, and that goes to the next model. So you're going to get software pipelines, believe it or not, you can get them running out of an Excel spreadsheet, or a modern enhanced Excel spreadsheet, and that's where the future's going to be. So you're really, if you're going to be really good in this business, you're going to have to be able to use your brain, you're going to have to understand what data means, you're going to have to figure out what modeling really means. What happens if we were, normally for a lot of this stuff, we do bell curves. Well, that doesn't have to be the only distribution, you can do fat tails. So if you did fat tail distributions at a bell curve, oh, it gets you much different results. Now, which one's better? I don't know, but you know, I'm just using an example. To another cut in the data, so, or a view. Now, talk about more about the tech behind this. You mentioned AI, I mean, we're talking about math, machine learning, deep learning. Yeah. Add some color to that. Well, the tech behind it is, believe it or not, some relatively old tech. There's a technology called RDF, which has kind of been around for a long time. It's a kind of machine learning, not machine learning, I'm sorry, machine code type, fairly simplistic definitions, lots of angle brackets and all this stuff. There's a higher level that was extracted, I think to put it into standard in like 2004, 2005, called Owl 2.0. And it does a lot at a higher level, the same stuff that RDF does, okay. You could also create, believe it or not, your own special ways of communicating an ontology just using XML, okay. So XBRL is an enhanced version of XML, okay. And so some of these older technologies, quote unquote old, 20 years old, are essentially going to be driving a lot of this stuff. So, you know, Corba, right? Corba is what's a made OMG, on the communication across the thing. Do you realize that basically every single device in the world has a Corba standard in it? Okay, yeah, OMG standard is in all your smartphones and all your computers and that's how they communicate. It turns out that a lot of this old stuff, quote unquote, is so rigidly well defined, well done, that you can build modern stuff that takes us to the Mars based on these old standards. All right, we got to go, but I got to give you the award for the most acronyms. We got HR, 1530, FIGI, OMG, SBRM, FSOC, TARP, OFR, RDF, we knew that, OWL, XML, XBRL, Corba, which we of course knew, but that's, well done, yeah. Lars, thanks so much for coming to the people. Thank you very much. It was great to have you. All right, keep it right there everybody. We'll be back with our next guest from MIT, CDOIQ, right after this short brief, short message, thank you.