 From Cambridge, Massachusetts, it's The Cube, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. Welcome back to MIT, everybody. We're watching The Cube, the leader in live tech coverage. We're here at day two of the MIT Chief Data Officer Information Quality Conference. Dave Vellante with Paul Gillan, Andy Palmer is here. He's the co-founder and CEO of Tamer. Good to see you again. It's great to see you, Dave. Thanks for coming on. I didn't ask this to Mike. I can kind of infer from some of his answers, but why did you guys start Tamer? Well, it really started with an academic project that Mike was doing over at MIT. And I was over at Novartis at the time as the Chief Data Officer over there. And what we really found was that there were a lot of companies really suffering from data mastering as the primary bottleneck in their company. They had used great new tech like the Vertica system that we had built and automated a lot of their warehousing and such. But the real bottleneck was getting lots of data integrated and mastered really, really quickly. Yeah, he took us through the sort of problems with, well, obviously the EDW in terms of scaling, master data management and the scaling problems. Was that really the problem that you were trying to solve was how to scale? It really was. And when we started, I mean, it was like seven years ago, eight years ago now that we started the company. And maybe almost 10 when we started working on the academic project. And at that time people weren't really thinking or worried about that. They were still kind of digesting big data as it was called. But I think what Mike and I kind of felt was going on was that people were going to sort of get over the big data and the volume of data. And then we're going to start worrying about the variety of the data and how to make the data cleaner and more organized. And I think we called that one pretty much right. Maybe we're a little bit early but I think now variety is the big problem. So the other thing about big data is oftentimes associated with Hadoop which was a batch and then you sort of saw this shift to real time and Spark was going to fix all that. And so what are you seeing in terms of the trends in terms of how data is being used to drive almost near real time business decisions? Yeah, well you know Mike and I came out really specifically back in 2007 and declared that we thought Hadoop and HDFS was going to be far less impactful than other people thought. Oh, seven. Yeah, yeah, and Mike actually was really aggressive in saying it was going to be a disaster. And I think we've finally seen that actually play out a bit now, that the bloom is off the rose so to speak. And so they're these fundamental things that big companies struggle with in terms of their data and cleaning it up and organizing it and making an I-Quant. Anybody that's worked at one of these big companies can tell you that the data that they get from most of their internal systems sucks. Plain and simple. And so cleaning up that data, turning it into something that's an asset rather than liability is really what Tamer's all about and it's kind of our mission. We're out there to do this and it sort of pales in comparison. You think about amount of money that some of these companies have spent on systems like SAP and you're like yeah but all the data inside of these systems is so bad and so ugly and unuseful. Like we've got to fix that problem. So you're, I mean your special sauce is machine learning. Where are you applying machine learning most effectively? We apply machine learning to probably the least sexy problem on the planet. There are a lot of companies out there that use machine learning and AI to do predictive algorithms and all kinds of cool stuff. All we do with machine learning is actually use it to clean up data and organize data, get it ready for people to use AI. And I started in the AI industry back in the late 1980s and really I learned from this guy Marvin Minsky and Marvin taught me two things. First was garbage in, garbage out. There's no algorithm that's worth anything unless you've got great data. And the second one is it's always about the human and the machine working together. And I've really been working on those two same principles most of my career and Tamer really brings both of those together. Our goal is to prepare data so that it can be used analytically inside of these companies that it's actually high quality and useful. And the way we do that involves bringing together the machine, mostly these advanced machine learning algorithms with humans, subject matter experts inside of these companies that actually know all the ins and outs and all the intricacies of the data inside of their company. So, as they say, garbage in, garbage out. If you don't have good training data of course you're not going to have a good ML model. How much upfront work is required? Gee, I know it was one of your customers. I mean how much time is required to put together an ML model that can deal with 20 million records like that? Well, you know the amazing thing that's happened for us in the last five years especially is that now we've built enough models from scratch inside of these large global 2000 companies that very rarely do we go into a place where we don't already have a model that's pre-built that they can use as a starting point. And I think that's the same thing that's happening in modeling in general. If you look at great companies like DataRobot and even in the Python community ML Lib that the accessibility of these modeling tools and the models themselves are actually so, they're commoditized. And so most of our models and most of the projects we work on we've already got a model that's a starting point. We don't really have to start from scratch. You mentioned Jonathan AI in the 80s. The notion of AI is the same as it was in the 80s and now we've just got the tooling, the horsepower, the data to take advantage of it is the concept changed. The math is all the same. Like absolutely full stop. Like there's really no new math. The two things I think that have changed are first there's a lot more data that's available now. And neural nets are a great example, right? One of Marvin's things. That when you look at Google Translate and how aggressively they use neural nets it was the quantity of data that was available that actually made neural nets work. The second thing that's changed is the cheap availability of compute. That now the largest supercomputer in the world is available to rent by the minute. And so you've got all this data, you've got all this really cheap compute. And then third thing is what you alluded to earlier the accessibility of all the math that now it's becoming so simple and easy to apply these math techniques. And they're becoming, it's almost to the point where the average data scientist, not the advance of the average data scientist can do a practice AI techniques that 20 years ago required five PhDs. It's not surprising that Google with its neural net technology, all the search data that it has has been so successful. Does this surprise you that Amazon with Alexa was able to compete so effectively? Well, I think that I would never underestimate Amazon and their ability to build great tech. They've done some amazing work. One of my favorite, Mike and I actually one of our favorite examples in the last three years they took their Redshift system that competed with Vertica and they re-implemented it. And you know, as a compiled system and it really runs incredibly fast. I mean that feat of engineering was truly exceptional. It's interesting to hear you say that because it wasn't Redshift originally a PowerXL. Yeah, that's right. Larry Ellison craps all over Redshift because ah, it's just open source software that they just took and repackaged but you're saying they did some major engineering to it. Oh my gosh, yeah. It's like Mike and I both, you know, we always compared PowerXL to Vertica and we always knew we were better in a whole bunch of ways but this latest rewrite that they've done as a compiled version, it's really good. So as a guy who's been doing AI for 30 years now and it's really seeing it come into its own, a lot of AI projects seems right now are sort of low hanging fruit, small scale stuff. Where do you see AI in five years? What kind of projects are companies going to be undertaking, what kind of new applications are going to come out of this phenomenon? I think we're at the very beginning of this cycle and actually there's a lot more potential than has been realized. So I think we are in the pick the low hanging fruit kind of a thing but some of the potential applications of AI are so much more impactful especially as we modernize core infrastructure in the enterprise. So the enterprise is sort of living with this huge legacy burden and we always are encouraging it. Tamer, our customers to think of all their existing legacy systems as just data generating machines and the faster they can get that data into a state where they can start doing state of the art AI work on top of it, the better. And so really you've got to put the legacy burden aside and kind of draw this line in the sand so that as you really build their muscles on the AI side that you can take advantage of that with all the data that they're generating every single day. So if you think about these data repositories, enterprise data warehouse, you guys built better with MPP technology, better data warehouses, then you have the master data management stuff, the top down enterprise data models, Hadoop and big data. None of them really lived up to their promise. So kind of somewhat unfair to the MPP guys because you said, hey, we're just going to run faster and you did, but you didn't say we're going to change the world and all that stuff or ZDW did. Do you feel like this next wave is actually going to live up to the promise? I think the next phase is, and it's very logical, like I know you're talking to Chris Lynch here in a minute and what they're doing at AtScale and AtScale and Tamer, these companies are all in the same general area that's kind of related to how do you take all this data and actually prepare it and turn it into something that's consumable really quickly and easily for all of these new data consumers in the enterprise. And so that's the next logical phase in this process. Now, will this phase be the one that finally sort of meets the high expectations that were set 20, 30 years ago with enterprise data warehousing? I don't know, but we're certainly getting closer to it. Well, I kind of hope not because then we'll have less to do. Any other cool stuff that you see out there that as a technologist? I'm huge, I'm fanatical right now about healthcare. I think that the opportunity for healthcare to be transformed with technology is almost makes everything else look like chump change. What aspect of healthcare? Well, I think that the most obvious thing is that now with the consumer sort of in the driver's seat in healthcare that technology companies that come in and provide consumer driven solutions that meet the needs of patients regardless of how dysfunctional the healthcare system is, that's killer stuff. We had a great company here in Boston called Pillpack. It was a great example of that where they just build something better for consumers and it was so popular and so broadly adopted. Again, eventually Amazon bought it for a billion dollars but those kinds of things in healthcare, Pillpack is just the beginning. There's lots and lots of those kinds of opportunities. Well, healthcare is ripe for disruption and it hasn't been hit with the digital disruption. And neither has financial services really, certainly defense has not yet in other industries, they're high-risk industries. So it takes longer. Well, Andy, thanks so much for making the time. I know you got to run, awesome seeing you, take care. All right, keep it right there. Everybody will be back with our next guest right after this short break. You're watching theCUBE from MIT, CBOIQ. Right back.