 Go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of the University. We'd like to thank you for joining this day diversity webinar how to avoid the 10 day data analytics funders best practices for success in 2021. This is today by Tamer. Just a couple of points to get us started. It's a large number of people that attend these sessions you will be needed during the webinar. For questions we will be collecting them by the Q&A in the bottom right hand corner of your screen or if you'd like to tweet we encourage you to share highlights or questions by Twitter using hashtag data diversity and if you'd like to chat with us or with each other we certainly encourage you to do so just click the chat icon in the bottom right hand corner of your screen for that feature and as always we will send a follow-up email within two business days continuing with the slides, the recording of the session and additional information that passes throughout the webinar. Now let me introduce to you our speakers for today Dr. Michael Stonebreaker and Anthony Dayton. Michael is an answer professor at MIT Computer and Science and Artificial Intelligence Laboratory and a database pioneer who specializes in database management systems and data integration. He has been a pioneer of database research and technology for more than 40 years and is the author of scores of papers in this area. In addition he has started three other companies in the big data space including Tamer. He also co-founded the Grigel Science and Technology Center for the perfect database at MIT. Anthony is the new product officer at Tamer overseeing product and solution strategy for Tamer's growing data mastering solution. Anthony was most recently CMO at Salonis and senior vice president and products at CLIC and has over 20 years of experience building and scaling enterprise software companies. He holds a bachelor's degree from Northwestern University and an MBA with high distinction from Harvard Business School. I look forward to our distinguished speakers to get today's webinar started. Hello and welcome. Hi this is Michael Stonebreaker speaking. Anthony and I are going to sort of ping pong a little bit so he'll be breaking in to tell me whenever I make a mistake. I also want to mention that Anthony and I are probably the only two people on this call who have never watched Breaking Bad. So you'll have to have to excuse our ignorance in this area. Also a bunch of these slides are not mine. They come to you from Tamer marketing including the one of the Winnebago here on the screen. Anyway so I'm going to go through what I consider the 10 biggest blunders that I see enterprises committing in the state of Spurs. And I'll just do them one by one by one. The first one is not planning to move most everything to the cloud. Next slide. Now it may take a while. This is not something you're going to accomplish this year. It may take a decade but it's the right thing to do. Why do I say that? Let me give you two quick vignettes. The first one is from Dave DeWitt who until recently was the head of the Microsoft Jim Gray Systems Lab in Madison, Wisconsin. So Dave said here's the technology that Edge was using for its data centers. They are shipping containers in a parking lot. Chilled water in, internet in, power in, otherwise sealed. Roof and walls are optional. They're only there if you need the security. Now discuss that with whatever you guys are doing with base flooring in Boston or New York. You've just got to believe that the cloud guys are going to do it cheaper. Another vignette comes from James Hamilton who works for AWS. And James, and I have no reasons just believe it. But he claims that Amazon can end up a load for 25% of your cost. Now prices may not accurately track cost in the short run but in the long run they will. And if they're a factor of four cheaper then I don't see how you're going to compute long run. Moreover, if you move to the cloud, then the big deal these days is elasticity. You use one node for end of month processing. You use 20 nodes for the day before that when you're getting ready for end of month processing. Use three nodes on the first day of the next month and so forth. So you can scale your resources with your load. You can't do that with on-prem data centers. So of course what everybody asks is how do I manage to move stuff from on-prem to the cloud? Well data will move easily. Just move the data. So decision support will be the first to move. And if you're not well along with moving all of your decision support to the cloud then I think you're making a big mistake. Next slide. Before we jump into the next slide I'll just chime in. I think we talk a lot about this idea of moving data to the cloud and I think partly because the cloud vendors who are aiming to capture that data care deeply about getting your data in the cloud because it creates lock-in. I think there's kind of a dual point here and Mike you make this this point about elasticity. This is the point I think we need to underscore not only is data moving to the cloud but compute is moving to the cloud as well. And so you're right that you can scale up and scale down but you can also do things on the cloud that would be just impossible to do if you were doing it in your own data center. Running a thousand-node spark cluster completely reasonable. You'd be doing it in five minutes from now on any of the three major clouds. Running a thousand-node spark cluster on-premise may be impossible or certainly beyond the scope of almost most IT departments. So the sort of change in capability that moving data to the cloud enables is actually dwarfed I think by the change in capability that having compute on the cloud is. And in particular in the area of what you can do with machine learning which I know is a theme that'll come up in future slides. But I know your next slide since I've seen the slides before I there's a lot of I hear a lot of angst from customers when it comes to moving stuff to the cloud a lot of sort of like objections so maybe maybe you knock off some of those objective objections Mike. Okay well you did say one thing which I want to underscore which is if you move your applications to the cloud you can either do that in a cloud independent way or in an AWS dependent way. So you have to decide pretty quickly whether you're going to avoid lock-in or not. And so that's just a decision you have to make. You can make it either way and just knowingly make knowingly make that choice. I hear lots of people say I can't move to the cloud because and I'll just mention a couple of these and then we'll go on. It turns out that I hang out most of the time at MIT in the computer science and artificial intelligence lab. We run a data center on waste flooring in Cambridge, Massachusetts. Our data center guys claim that they are cheaper than the cloud which is to say they claim that the cloud is not less expensive than running on their data center in Cambridge. And the answer is that's technically correct because they are not paying rent per square foot and they're not paying for power or air conditioning. So we're cheating in the sense that they are they are taking advantage of externalities that should not be there. So if it's apples to apples chances are you're going to be more expensive. People often mention security. Well the cloud security is likely better than yours. We hear enough horror stories about this configuration's rogue employees and all from on-prem stuff. So chances are their security is better than yours and maybe your CEO under other restrictions doesn't like the idea. This could be in your bonnet but that's I'm going to talk about that again in item 11 which is your bonus of London to come. Anyway I think the cloud is in your future the sooner you get going on it the better. Next slide. Of course as Anthony pointed out where does your application run. If you're running decision support just move whatever your decision support is to the cloud. Other stuff like legacy OLTP may well be mired in may well be mired in sins in the past. This needs a lot of work to move your OLTP and so do it gingerly and it may take you a decade or more. But sooner or later you are not going to run a data center on-prem. I just don't think that's going to happen. Next slide. Yeah before we leave this one I would just also add this sort of theme I've seen with customers is that as they move data to the cloud they also use that as an opportunity to think about consolidating data especially from the perspective of these decision supports or analytic applications. Again a theme Mike I know you'll kind of return to later but the analogy I sort of always draw here is you don't move a dirty house. So if you're going to move houses you know use the opportunity to also sort of go through the stuff throw things out you know consolidate stuff before you pay for movers to come move it. Similarly as you move data to the cloud it's also an opportunity to take a look at those sources you know discard ones that are no longer relevant to consolidate around key entities that matter etc. Okay now there's just lots and lots of talk on machine learning and more generally under AI. Expect ML to be disruptive in just all businesses. Next slide. So ML whether it's deep learning think neural networks or conventional machine learning which has been around for about three decades there's an enormous amount of research on both conventional and deep learning and it's getting much much much better and it is guaranteed to displace workers easy to explain jobs. So think autonomous vehicles think automatic checkout of the grocery store think drone delivery think getting your taxes done think actuary calculations all of that is going to be replaced by computer programs. Your choice in looking at machine learning is you are either going to be a disruptor meaning somebody pushing ML or you will be a disruptee meaning somebody else one of your competitors is going to disrupt you your choice. So you have a choice in the matter but one number two you can either be a taxi cab owner or you can think like Uber and Lyft one or the other and in my opinion it's going to be much more fun off into the future to be them to be a disruptee. Next slide. So what do you do? The answer is ML is fairly arcane. You are not going to hire aunt Maude from Cedar Rapids Iowa to be your ML expert. So you're going to have to pay up to get some ML expertise there in short supply and very don't hire talent here to do that. We'll come back to that in a bit and get going on the race by hiring expertise pay whatever it takes to get real class talent. Next slide. Yeah so I would add here that the first wonder and the second are tightly related in the sense that the ability to so prior to having data in the cloud and elastic compute in the cloud then the idea of machine learning and AI as a disruptive platform shift may have been true but it would have felt a bit out of reach out of touch and really only the purview of companies that were capable of standing up large infrastructure. So I think Google, Uber, etc. What taken together shift one or sorry shift Blunder one and Blunder two taken together what they're really what we're really saying here is that everybody the entire world any company now has access to the kind of platform that even as recently as a few years ago would have required you know quite a bit of expertise to stand up and so you know doing things the old way doing things the way we've always done it doing things the way that you know that we've been successful in the past is a surefire way to extinction and so you know the idea here is the playing field has changed and you either sort of change with it and take an ML and AI based approach or your sort of road kill and again a theme will come back to in a future Blunder. Next slide. Okay here's here is my favorite Blunder. A lot of you say I've got to get going on data science ML data science more generally and so I'm going to empower a data sciences group and they're going to change the world. Well your real data science problem is not ML expertise as I'll explain right now on the next slide. I talked to a lot of data scientists and no one claims they spend less than 80 percent of their time doing finding the data they want to analyze doing data integration to put it together and cleaning up the the nest that that data may well be. Most people say 90 plus percent time. So for example the chief data scientist of my robot they're the folks that bring you the vacuum cleaner that runs around the floor. She says I spend 90 percent of my time doing data discovery data integration and data cleaning leaving me 10 percent of my time to do the job for which I was hired. However of that 10 percent I spend 90 percent fixing my data cleaning errors meaning she spends 99 percent of her time on data discovery data integration and data cleaning. So she does not do data science or machine learning for a living. She does data integration data cleaning data discovery. The chief data scientist of Merck which has about a thousand data scientists said exactly the same thing. Ninety five plus percent of his data scientist time is doing data integration. So Anthony said it really clearly a couple minutes ago without clean data or clean enough data your machine learning is worthless. So garbage in garbage out and so ML is not going to pay off unless you solve the data integration problem together in front of it. So what should you do? Well obviously stop viewing data integration as a piecemeal thing to be solved by each individual data scientist in his or her project. This is an enterprise wide problem getting your data scientist good data and start by making sure that your chief data officer has read access to everything that your enterprise has. So if he doesn't have read access to all enterprise data then you're working for the wrong company. So next slide. So Mike maybe I suspect that's a very scary statement for many people on the call. Maybe you could can you share a little bit more about what you mean by read access to the enterprise data and why that's so important. Sure. So the chief data scientist at Merck or sorry at GSK Blacksmith Climb when he was hired he made a deal with the CEO. I'm not going to take this job unless I have read access to everything. And the CEO of course said why do we need read access to everything? Because everything is the thing the thing that my data scientists are going to want to have to get access to and if it's if it's really very long I want to know if it if something doesn't exist I want to know. So somebody has to be able to figure this out and if you're a scientist then all their time trying to get access to corporate information that's you're wasting your time and you might as well solve this at an executive level. So yeah cleaning up data is annoying then getting access to data to clean it up is like even more annoying. And then the other thing to think about is one common theme I hear from customers I speak to is at its core every one of our businesses is fundamentally a data business. You may think you're in the business of you know making drugs or in the business of logistics or whatever but at its core the real asset you sit on is that data and a chief data officer by the HR needs to have access to that core asset. Sorry. Okay. So wonder number four and I hear this just all the time. So we say okay I understand data integration is a problem but I got that solved. I have ET I have ETL in place I have a master data management system from one of the major elephants in place. I'm also unfortunately the answer is you're not right. So the real blunder is the belief that traditional data integration is going to solve this issue. So traditional data integration means extract transformer load ETL available from the variety of vendors informatica IBM talent. Or a belief that master data management also available from the usual suspects to solve your data integration challenge. Why is that? Next slide. So what is ETL all about? What is extract transformer load? Well here's the way it's sold. So you decide what data sources you want to integrate. So that comes down from God or somehow you decide. You build a global data model upfront from these data sources put your best person on it and that will get you a global data model. And then for every individual data source send a programmer out to interview the data set owner. Figure out what he's got, how it's formatted, figure out how to extract it. He then builds an extractor data cleaning routines, typically in a proprietary scripting language and loads data into this will be typically in the data warehouse. So this is what's peddled by the ETL vendors. And I could just tell you from 30 years with experience. I've never seen this technique work for more than 20 data sources. Why is that? Well it's food human intensive. Number one, you got to build a global scheme up front and that's way too difficult to scale. And you guys all tried this 20 years ago building enterprise-wide data models. They all failed. They all failed because you sent a team off to do it. It took two years and by then the whole business had changed to something else. So I've never seen this technique work with scale. So if you have 20 data sources and that's all you ever want to integrate by all means out of the ETL. But most enterprises I know have way more than 20 data sources. So Merck for example, which I'm going to convert here, has 4,000 plus or minus Oracle databases. They don't even know how many they have. Plus the data lake, plus countless spot nodes, and data for the web is also important. The scope of possible integration is all this stuff, way more than 20 data sources. So ETL simply doesn't work at scale. Next slide. Once you've managed to do ETL, however you do it, you need to kind of launch Merge. So you want to be able to find out if you're learning Mike Stonebreaker from multiple data sources, you need to match up my source records on match process, which is you need to do, you need to do consolidation of entities, and that's typically called match. So that's put together, all the records correspond to a single entity. And then you typically want to merge those into what's called the golden record, which is the definitive selling for Mike Stonebreaker's definitive address, and so forth. So the MDM vendors all suggest doing match merge by using a rule system. So implement match rules, for example, two entities are the same as they have the same address, for example, and using rules we merge, take the most recent value of multiple elements and so forth. So the MDM guy all suggests building rule systems to solve match merge. Now, the general thinking that I have is that you can build about 500 rules. And rules, by the way, are if, x, then y. And they're not ordered into a program, they're just a bunch of rules. So you can block about 500. You sort of stare at it, okay, I'll give you a thousand. I'll give you 2000. But no one I've seen has been able to build and manage a rule of mandatory with 20,000 rules. So just for example, if you require more than 500 rules to solve your problem, then you can be over with an MDM system. Well, who needs more rules than this? Well, GE, multi-glomerate, they have about 20 million spend transactions that they want to classify. And a spend transaction is you spend 50 bucks taking a cab from the airport to your home. So they have rebuilt classification hierarchy for spend. So you can spend on everything, a subset of everything on this travel, a subset of travel you can filter. They started writing rules in an MDM system. They wrote 500 rules, which is what you can reasonably expect to block by yourself. And that classified 10% of their spend. What about the other 90%? They'd have to write at least 5,000 more rules. And they quickly realized that there was no way they could write and maintain a rule base of 5,000 rules. So MDM just doesn't scale the large numbers of rules. Next slide. So let me just quickly add here and maybe link together the first five blunders slightly, which is so one question you might reasonably have in your mind is are the ETL and MDM vendors full of bad engineers and they just built really bad software? And I would argue that they're not. Obviously they're full of very smart people. It's that they architected the approach 10, 15, 20 years ago in an environment where the only reasonable mechanism of achieving the outcome was to attempt a rule-based system. Processing was relatively slow. The data was stuck in databases and relatively hard to get access to. And frankly, the overall IT strategy at the time was to see if you could get all of your data into the world's biggest data warehouse or not forbid into the world's largest SAP implementation. Again, a theme we'll come back to I think in a moment. In any case, in that environment, this idea of a rules approach was a reasonable way to attack the problem. What we've seen is a platform shift which has enabled a disruption in the market. And that platform shift is wonder one, which is this move for data to the cloud and compute to the cloud, which has opened up new possibilities. So that if you were to start a company today and attack the problem, you wouldn't architect with a rules-based approach. And the sort of elephants in this industry are saddled with the decisions of their past on how they've architected. Okay. We've pretty much had a good lead in to the next slide, which is if traditional ETL and MDM don't scale, then what do you do instead? At scale, you need to run ML. You have no space. This is an ML problem. At scale, you cannot use traditional techniques. And as Anthony said, an easy path to ML is just run ML. So what did you do while they worked with Tamer? They took GE's 500 rules, which classified 10% of their data, used it as training data for an ML system, which is in itself, and it classified the remaining 90%. So at scale, data integration is an ML problem. Full stop. That's what you need to do. Traditional solutions just don't scale. Now, if you have a small problem now, but you expect a big problem later, then you are heading toward deep quicksand if you use a traditional solution. So ML is the answer on how to scale ETL and how to scale MDM. And the traditional vendors don't do it because their products are legacy at this point. Next slide. Okay. Anthony already covered this. I hear a lot, not so much anymore. They said, well, I have, I invested in the world's best data warehouse. I've got everything in order. My data warehouse guys are solving my analyst needs. Life is good. Next slide. Well, data warehouses are good for something. They are good for putting structured data lots and lots of it, a few data sources, not from thousands. And they're good at customer-facing structured data. That's what they were built for in the mind use and that's what they're really good at. They're not good at text. They're not good at images. They're not good at video. And they don't do anything about data duration. So use this technology for what it's good for. Don't try and make your data warehouse do that in natural acts. And by the way, the decision support, which is what we're talking about is going to move to the cloud if it hasn't already. So if you're moving to the cloud, you get to change vendors. And so get rid of the high price proprietary spread if you brought into it already. So that's people like Teradata. So move to the cloud and remember that your warehouse data, your apps are moving to the cloud and you get to choose a new vendor and make that decision carefully. Next slide. I would add here, the blunder is that your data warehouse isn't going to solve all your problems. I would equally add that your major ERP vendor is also not going to solve all of your problems. In fact, quite the opposite. What we see in customers is move to a best of breed approach and thinking about taking advantage of and utilizing the best application, best operational application for the task at hand and running those in the cloud. And I think this is a really important strategy and shift for organizations today to think about optimizing their business processes, optimizing the way they work, the way they engage customers, the way they engage their employees, how they do business by taking advantage of best of breed applications. That strategy is, I think, one that creates tremendous business value and it's quite at odds with the strategy of making do with the operational application from one of the big vendors. So that's a lead in to the next slide, which is to say, well, maybe my warehouse isn't going to solve all my problems. But five years ago, I got told that Spark was the answer. And so I set up a Spark cluster and a word, Hadoop cluster, and that's going to solve all my problems. And that's simply not going to end. Next slide. So Hadoop, especially, is not good for anything, which isn't very good at anything. Uh, best of breed solutions come up around it. Spark is newer technology and is better, but it's still not that terrific at stuff. So Spark SQL is not competitive against the best solution. So as Anthony just said, you should use the best of breed, not the lowest common denominator, at least for your secret sauce. That's the stuff that is going to differentiate you from your competitors. This is universal blunder, which is the order to use only one vendor. And that means you run lowest common denominator, which means the lowest common denominator is not that good at anything. And for your secret sauce, that's not such a good idea. I'll also Spark and Hadoop are useless on data integration, which is one of your biggest problems. Next slide. So I hear a lot of stories of people who say, well, I'm running a big Hadoop cluster and it's empty. No one's using it. So what do I do? Well, we purpose it to be a data lake. That's the rage. We purpose it to be a compute engine for data integration, or better yet throw it away. After all, hardware lifetime is three years. You probably bought that cluster five years ago. And the thing to always remember is that you've got to move at the times and being stuck in legacy in the legacy world is not a great idea. Yep. And I would add here that the idea of distributed compute and orchestrating distributed compute, which is kind of at the core of both ideas. And I agree with Mike that Spark is a more recent implementation. Those are good ideas, good design principles, and in particular, good ideas in the context of data when it's sitting in the cloud next to a highly elastic compute. And it turns out to be a good foundation for thinking about machine learning. So that's a minor minor point. Okay. So about four years ago, Cloudera realized that Hadoop was not good for anything. And that's a big problem if you're a Hadoop vendor that makes most of your money off of Hadoop. So Cloudera is a superb marketing company. And they said, well, what we want to do is we want to switch into telling people to use a Hadoop cluster data lake. So basically, they switch to marketing data lakes. And therefore, data lakes are the solution to all your problems. Next slide. What does that blunder really mean? Just load all your data into the data lake, and you'll be able to correlate anything with anything. Well, more recently, Amazon and others have said, well, let's start calling them a lake house. So a data lake and a lake house to be your synonymous. And the thing you should tattoo on your brain is this independent or constructive data sets are never, ever self compatible. They are just not. So you are not going to be able to take two independently constructed databases, load them in the data lake and correlate them. That's just completely not going to happen. Why is that not going to happen? Well, it's on the next slide. Well, first of all, your schemas don't match. So if you're the human resources guy in Paris, I'm the human resources guy in New York, you call it salary, I call it wages. Units don't match. You use euros, I use dollars. The semantics, the salaries don't match. In New York, life is a gross before taxes. In Paris, your salaries are met after taxes and euros and include a lunch amount. Time granularities often don't match. You have annual data and monthly data. The killer is, of course, data is dirty. So sometimes numeric data, 99 or minus 99, turns out to mean no. If I'm using a system where no, no, and you're using the system where a specific value like minus 99 means no. If I average your numbers with my numbers, I'm going to get garbage. So the data is dirty, meaning it's missing or it's wrong. Figure on the average, 10% of your data is missing or wrong. And your data is dirty, therefore, you can't just call it. Also, duplicates must be removed. If you're the HR guy in Paris, I'm the HR guy in New York, Mike Stonebreaker could work for both subsidiaries and my name could be expelled in one data set and not in the other. And therefore, there are no keys. And I've got to do any deconsolidation and any consolidation is just not trivial. So my favorite example is the Tamer Pestimer who wanted to ask the question, how many suppliers do I have? And so he ended up all the suppliers from all the various data sets and he got a number. And after Tamer got done with duplicates, he had one fourth that number. So we're often a large number of duplicates. And if your counting companies are counting suppliers, those duplicates may make a huge difference. So you've got to remove duplicates, the data is dirty. You've got to do scheme integration and so forth. Next slide. So the net result if you just put your data in a data lake and you start doing correlations, your analytics will all be garbage. And so what happens is that your analysts spend 95% of the time or 99% of their time finding, fixing, and integrating the data. And your ML models will fail if you don't do this. Next slide. So what do you do? Well, I'm a huge fan of data lakes. You've got to put your data, if you want to correlate your data, you've got to put it somewhere. But you just don't have the data lake, you have the swamp. You need a deterioration system, which will deal with all the aforementioned problems. And they are not trivial. Do not think that you can put junior programmer on this problem. And so the traditional technology, London number four is likely to fail. So this, in my opinion, is one of your major 800 pound relics. How to organize integration. You've got to put your best people on it. At Tamer, we see a lot of in-house solutions that we're getting to replace. Chances are they're crap. So chances are whatever you built in-house is crap. And you've got to use modern technology. And that does not come from the legacy, the legacy MDM and TL vendor, and certainly does not come from your home group system. So if you want to deal with the best technology, you've got to deal with smallish companies like Tamer. Next slide. So I would just add here, the data lake failure blunder is really the blunder of assuming that the answer is a data storage problem. And the swap is obviously a good analogy in that just consolidating your data into an environment does not solve the problem. I mean, it's a good start and there's nothing wrong with it. As Mike says, you need to have that. It's sort of a necessary but not sufficient condition to success. And then his latter point, which is, this is an incredibly difficult problem, technologically difficult problem, means that it's also ripe for technical innovation. And in a sense, if Tamer can take credit for anything, it's for kind of working hard from its academic roots at MIT through to today to working out the math of solving this problem, which it turns out is non-trivial. So it's a great example of where when you build on the new architecture, you end up with a radically different solution. Okay, blunder number eight. Lots of you outsource your shiny new stuff to consult with. In my opinion, this is a likely company ending blunder. Why is that? Next slide. If you're a typical enterprise, you spend 95% of your IT budget just keeping the lights on. Most of you are dug in pretty deep legacy code. And so the shiny new stuff gets outsourced often because there's no one available internally. We're going to deal with it. Next slide. In my opinion, this is company ending. Number one, your maintenance is poor. So creative people quit. And so you have no good talent to work on new stuff. And you have a hard time hiring great talent, even if you tried. Thanks great people to hire great people. So your new stuff is your secret sauce over the next decade or so. As you move to an ML powered world, please don't outsource it. This is long-term suicide. Instead, outsource the daily crap, the stuff that you've got to do to keep the lights on for sure outsource email and anything else that you can outsource. Software is your secret sauce. Best if you want people hire a few wizards. They can hire some other wizards. And that is going to be what is going to be your differentiator against your competitors ten years from now. Next slide. So what should you do? Well, start by hiring some ML expertise. Outsource before a maintenance and cancel your time. Okay, now we're on to Lindsay. I'll just briefly add here. This is a tale as old as time, which is every time we see a platform disruption in the technology market, the big consultancies kind of custom build solutions on top of that. There was a time when I began my career at Siebel Systems when in order to build essentially a CRM system, the strategy was to hire a big consultancy and build it from scratch on first principles. And along came Siebel in that case and said, actually, you know what, this is a kind of solvable software problem. We'll just go build a standard solution to this problem. And I think that's exactly what we're seeing in this market as well. So custom building AI and ML solutions is not the strategy. Okay, next slide. All of you should be able to look by Clayton Christiansen called the innovators dilemma. Next slide. So basically, a lot of you are mired in the past because you simply say, I can't move on. I can't, I can't deal with disruption. And so Clayton Christiansen analyzes this in some detail. And he calls it the innovators dilemma, which is you're selling the traditional stuff along comes some innovation that threatens to disrupt your market. And I don't have time since we're almost out of time to go through through an example. But he goes through a whole bunch of examples in this book that says it's a real dilemma because it's very difficult for you to hold on to your customer base or all of your pest base while you're moving from the old stuff to the new stuff. So if you succumb to the innovators dilemma, and you say, I therefore, I can't move to the new stuff. Next slide. Then in my opinion, you're dead in the long. You've got to be willing to give up your current business model and reinvent yourself. Otherwise, you're going to have business in the long run. You're going to get disrupted. And so you might as well do the best of it. So read Christiansen's book and then act on it. Realize that you're you may well do some of your current customers in the process. You want to avoid going out of business in the long run. I don't have to remind you that if you're a taxi driver in Cambridge, medallions in Cambridge, we were 700 K five years ago. Now there were 10 K going on zero. So you need to be able to reinvent yourself. Next slide. And who's going to help you reinvent yourself? You've got to pay up for a few rockets. Next slide. So who's going to help you avoid these plunders? Re-invent yourself. Move to data integration other than as per view of individual data scientists, etc, etc. So pay up for a few rocket scientists who is people who are way off scale. Your HR folks will like it. Chances are they will be weird. I know a bunch of them, they tend not to choose. They certainly don't wear a tie. They put their feet on the table. Please drive them away. Instead you've got to nurture these rocket scientists because they're the ones with good ideas who are going to be your salvation long term. Next slide. Okay. Now you might say, gee, Mike, Mike, I work for a company and we're succumbing to plunders two, four, six, and seven. So next slide. If you're working for a company that succumbs to any of these plunders, then you should be part of the solution, not part of the problem. So you should be fixing it. If you're not, if you're not fixing it, chances are your company is going to go out of business long term. And so you should be looking for a new employer, and of course Tamer is hiring if you're looking to work. That's the end of my slides. We're four minutes from the end. Maybe Anthony has some color or Shannon wants to see some questions. Yeah, why are we in the remaining time? I'd love to ask any questions. Yeah, what's the best way to do that? Yeah, so there's lots of questions coming in. So let me get to as many as I can here and just to answer the most commonly asked questions. Just a reminder, I will send a follow-up email to all registrants by end of this Thursday for this webinar. Great presentation, as always, you guys. Signing in here, you suggest moving most of all of our data to the cloud. What types of data should remain on-premise? On-prem data is going to be data that is buried in COBOL legacy systems from 1969 for which you have lost the source code. There will be silos of data that you just can't realistically move because it's just too expensive. And if it's too expensive to move, then put a big box around it, tie it up in the go, and leave it running wherever it is, and your successor and your job will hate you. But if it's not economically viable to move it, then you're stuck. Move anything that they may turn on investment case or moving. And when you mean cloud, could it also be a Docker or Kubernetes? Kubernetes and Docker are container technologies in which you can put your application. They can run on-prem. They can run on cloud. They are simply enablers. They're simply sandboxing technology that generally speaking you should embrace. But that doesn't depend on whether you want to move to the cloud or not. Kubernetes and Docker run on the cloud, run on-prem. They run pretty much everywhere. It's a good example of design decisions you would make differently knowing that you intend to run on a highly elastic compute infrastructure. Knowing that way, then design principles like leveraging Kubernetes would be worth the investment versus other approaches you might take. For example. Well, Mike and Anthony, thank you so much for this great presentation. But I'm afraid that is all the time we have scheduled for this. Again, and thanks to all our attendees for being so engaged in everything that we do. I'll get the questions over to Tim or Kate to help get you the rest of those answers for you. And thanks for being so engaged. Just again, a reminder, I will send a follow-up email by end of day Thursdays for this webinar with my suicide and the recording. Thank you both so much. Thank you all. Hope everybody has a great day. Stay safe out there. Thank you. Thank you everyone for joining us. A lot of fun. Thanks, Phil.