 And welcome, my name is Shannon Kemp, and I'm the Chief Digital Manager of DataVercity. We'd like to thank you for joining the latest installment of the Monthly DataVercity Webinar Series, Advanced Analytics with William McKnight. Today, William will be discussing data curation for artificial intelligence strategies. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen, or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag ADVAnalytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. And if you'd like to continue the conversation after the webinar, you can follow William and each other at community.dativersity.net. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for this series, William McKnight. William is the president of McKnight Consulting Group. He takes corporate information and turns it into a bottom-line producing asset. He's worked with major companies worldwide, 15 of the global 2000, and many others. McKnight Consulting Group focuses on delivering business value and solving business problems, utilizing proven streamlined approaches in information management. His teams have won several best practice competitions for their implementations. He has been helping companies adopt big data solutions. And with that, I will turn the floor over to William to get today's webinar started. Hello and welcome. Hello. Thank you for that, Shannon, and welcome, everybody. Welcome back to those of you that are tracking this series on a monthly basis. I appreciate that. Today we have a pretty important topic. It's data curation for artificial intelligence strategies. So I'm going to be going necessarily into a lot of artificial intelligence implementations of today. Some are pretty out there, for some of you, but some of them are quite accessible, I would say. However, they're not accessible if you don't have the right data foundation. And that's really what we're here to focus on today, is that data foundation for artificial intelligence. You know, a lot of people, including I think all of my clients, are undergoing what they call digital transformation. Now, that means something different everywhere, but it does mean we're shaking things up. And we're bringing in some new systems. We're doing more in the cloud. And we are leveling up our data because we see it's necessary. That much, I can say, is trying to be happening everywhere. And I would like to add to that that what I see in terms of digital transformation and that whole idea is that a lot of it really is about artificial intelligence. It's about getting ready for artificial intelligence and being able to incrementally do a little bit more in artificial intelligence. So I hope to give you some ideas today from industry, from my journeys around. I do believe that we are entering a new generation of need around data. And I know everybody has said that always. I certainly have before as well, but I just see that we are on the cusp of something big here, a new wave, if you will. And if the last wave was big data, which some of us got it under control and some of us didn't, but nonetheless, the next wave is coming. And that is in my book, Artificial Intelligence. I know many of you believe that as well. We now have multiple touch points with customers, many more so than ever before, due to the digital avenue. We have demographic, psychographic information. We have a lot of public data sets now that we have to process. We have to do it within a set of regulations that is emerging under our feet. And we have to do it in a governed way or we will be highly inefficient and that will not last. We have to do it now when there's a lot of complexity involved. Maybe there's a lot of complexity in our own data architectures. And we still got to move forward. And we have to do so with agility as well. So what we need is a real transformative approach. And a lot of data, let's face it, it's out of control within organizations today. So we need a plan and we need to be executing to that plan. We need to be revising that plan. And that plan needs to be multifaceted because as I'll get into this, you'll see that I'm not recommending that there's one place like a data lake, although that's important. There's not one place for the data that's going into artificial intelligence algorithms. It's going to be many places. And at the same time we're doing all this, there's a shortage of people like, for example, LinkedIn estimates 900,000 data science professionals are needed. And I think we will get there at some point in the future. It will turn into that. But we have about only 20,000 data scientists right now out there. So far fewer than we need. Anyway, here's AI in action in many different industries and some of the leaders around AI today. They are improving financial fraud situations with fraud detection, improving call center experiences with chatbots like it or not. People will manage 85% of business relationships now without human interaction soon. We are enhancing in-car navigation using computer vision, increasingly getting our hands off that steering wheel around our cars. We're handling things in our supply chain with a lot more accuracy. We're automating things. And a lot of people ask, well, how do I get started? Well, let's look around and find some things to automate because there's probably plenty of it. So we're automating things, we're predicting things with a higher degree of accuracy, whether it's flight delays or maintenance that might be required on the things that we own. So when you were dealing with the complexity of data today, with all of this, it's not a human scale problem. A human scale will not scale to the level that's necessary to get to world-class levels of efficiency and accuracy with the things like this, like I'm showing you here. This is where AI shines, where AI needs to do well. And to do that, we need the data to go into it. I'm going to show you some examples of that. And keep in mind, though, AI is not just about these applications that we're building for the market. It's also about a lot of internal efficiencies at developing the other applications. Maybe even using, you might think about it, is using AI to build AI. And this means that you can use AI to do data discovery, data integration, data cleansing, mastering data. These sorts of data management tasks now are being outright supported with AI embedded in some of the technologies that are available today. AI also helps with visibility, security, and governance. Now, what's new with deep learning? AI has been around a while since arguably the 50s. Nowhere near what it is today. But what's kicked it into high gear here is this idea of deep learning, which allows more complex problems to be tackled and others to be solved with higher accuracy with left cumbersome manual fine tuning. And you know when we make things easier that that's going to improve adoption. And certainly that's what has happened. And we're also starting to get our data acts together so that we can actually do machine learning, deep learning, and so forth. AI affects the entire organization. It's literally everywhere, or it literally literally can be everywhere if we think about it. One of the things that I challenge my clients with is wherever they're thinking about BI, think about AI. Think about AI doing that thing that you are going to do with BI because the likelihood is AI can do that thing. Now, can it produce a report? Well, yeah, but BI is better at doing that, but that's not the idea. The idea is what's the report for? What are you using that report for? Is it trying to do supply chain efficiencies? Is it trying to do fraud detection? You know, is it in that path? Well, those are the big things that AI can actually tackle. So that's how I want us to think about AI. AI is for a lot of the new business solutions that we have been challenged with as a consulting organization, whether it's for trying to determine the next best action to take, tracking customer satisfaction, or having efficient operations and innovative products. AI is the underpinning technology. And for effective AI, we have to have the right features used and trained. And in order to do this, we are using data not only from inside, but also from outside the organization. But you really need to have an efficient environment in order to take on this level of information. So where do we look for AI opportunities? Just a little bit more on this before we get into the data aspect of it. The products you make and the services that you offer, obviously, as I just mentioned, the supply chain for those products and services. Within your business operations, hiring, procurement, after-sale service, and so on, how about the intelligence that you use in determining and designing your product and service set? So again, AI for AI, what is it that how can we use AI to determine what to do next on our product set? And finally, the intelligence that you use in the marketing or the approval funnel for your products and services, what's the next best action to take to move things along? Okay. Now, we at McKnight Consulting Group have a data maturity spectrum. We have five levels, and this has been curated by looking at our client experiences over time. I have put AI into that data maturity spectrum, even though it's not maybe square on to data in some people's minds, it really has a lot to do with your maturity around data, that you're using it for AI. Now, this is level four out of five, so it's kind of up there. We have only or we speculate that only about 15% or so of organizations have achieved this level. And by the way, far fewer would have achieved level five, which I won't even get into, but that's even more about AI. So the Type A organizations that are positioning themselves now to be a leader in their industry in the next decade are definitely doing a lot of the things that I'm talking about here today. A lot of the case examples come from those types of organizations, but you're actually all in on AI in your data strategy at level four. And some of you, I always say, on my level three, you got to be there. You got to be sprinting to get to level three, and you better get there by, I'd say, the end of the year. And by the way, the thing keeps changing, right? So next year, this four might be a three. We might really be expected to be all in on AI in our data strategy by, let's say, 18 months from now. I really believe that we will. So it's time to start putting the pieces in place to get there. Now AI data must be governed. Have high data quality. It must be curated. And we should be absorbing. We should have the data science abilities to absorb most or all of our data into that machine of AI. And we needed it scale. We needed to scale. We now have a really strong value proposition for keeping history data around. And that's even circumventing some of the security issues there are within organizations who have historically said, well, let's just get rid of the data when we don't need it anymore. And we have. But now with AI stepping into the picture and the data scientists stepping into the picture, we're starting to hear more of, I can use that data. It's informative to me, and it's really going to be great for the business. So let's keep it around even at a slightly higher degree of risk. That's how important we're seeing this data become. High velocity data. Yeah, it's if you allow me to say so, it's big data. And I think big data has some distinctly different characteristics than the non big data that we've historically dealt with. And I don't mind the term really big data. So it's not just small data that happens to have grown. It's there's something, there are a lot of things about it, you know, the seas and all that. But high velocity data certainly is part of AI data. As a matter of fact, it's really important to AI data to have that level of detail and have those small ish kinds of data movements in our data. Okay, training data curation. You're going to be setting aside some data for training data. And you have to highly curate that kind of data. You have to have a plan for it. And so AI data includes this data that you're just using to train the algorithms with. So where are you collecting this data from? There's a lot of data to collect, a lot of systems to collect the data from. Hopefully you're doing most of this already. But I, you know, in my journey, I see that while people may be collecting that data, there's probably still some more maturity cycles to go through in order to get it to the place where it's going to be ready for AI. So whether you have a data warehouse or not, a data lake or not, and a master data management hub or not, that's not really the end of the story. I mean, it needs to be at a level of maturity. And so the data to collect for AI is wide ranging, spanning all current data. Yeah, all data. Leave no data behind. And I know I've used this mantra for a while, but it's becoming increasingly important now that we are seeing that it is important to our AI algorithms to have all data. And you might say, well, I don't have a use for that data. So why would I save it? And to me, that's only the start of the answer. Let's drill in on that a little bit and find out why don't you need this data? Why aren't you able to do things with this data? Let's work on that. Let's work on the data science, I like to say. E-commerce, ERP, CRM data, IOT data. This is big data. Heavy industry, factory, consumer, health, aircraft. We don't all have IOT data, but a lot of us do. And a lot of us just need to really open our eyes up and we can see that we actually do have this kind of data. And it is going to be important to our AI algorithms to forecast equipment performance breakdowns and any risks to health that there may be publicly available data, other parts of third party data. Be careful of overfitting data and thinking too much about the curation of data as you're bringing it in, making sure that it's going to fit a distinct downstream need. Bring that data in. It's call center recordings and chat logs, streaming sensor data, customer account data and purchase history, which you may have that already in your data warehouse. Great. I like to use this as kind of a checklist in kicking a client off in terms of AI. Do you have this data? Is it under control? Is it under management? I'd like to say because having it, like I said before, it's not enough. And there are certain attributes I like to see about the data to understand that it's under management. And we might create some spikes of activity around bringing that data up while we're working on an overarching AI strategy because things have to come together. Product catalogs and data sheets, public references, et cetera, what you have on there. You may be surprised to see my YouTube video content, audio tracks, but yes, things like that. And that's not certainly to be exclusive to all social, but there are other forms of social, especially around sentiment, social graphics, et cetera, that are interesting now to companies getting that data bed together so that they can do AI. For example, for predictive maintenance, and we've done some of this, we need structured data and unstructured data. I wanted to show you these different dimensions in terms of structured data. We've needed to do predictive maintenance, time series data, event data, and graph data. I know I'm just sort of describing the category of data here, but all of this data has been needed. And in the unstructured arena, text data, image data, even sound data, all of this goes into great predictive maintenance. Now, you can do shallow predictive maintenance. You can just say, well, I've been scanning apart. Airplanes do this. They have millions of sensors on the plane. They certainly can't be grade A around monitoring every single one of those things, but they're getting a lot better about it. But your first level of maturity is just to be able to see some degradation in the part and understand how much longer it has to go before you really need to replace it. And then you start bringing the supply chain into it and they're starting to do things like, well, we're going to have a 10-minute break, not 10 minutes, more like a 10-hour break for a plane at an airport overnight, perhaps, at a place where they actually do have this part, even though we still got six months to go, it looks like looking at the schedule, now's the time to do it. So there are some shallow things you can do about that, but then you can bring in other things like predicted weather that the plane's going to be on, the predicted load of the plane, et cetera, and really hyper-optimize things like that. And it's a lot of fun to get to that level of detail with this stuff. So where do you put this data? I've been challenging you. Get all this data. Get all this data. Let no data be left behind. So where are you going to put it? Well, cloud storage is sort of obvious. And this might be for your data lake, for example. We are speccing data lakes now almost exclusively, I'd say, in the cloud, whether it might be on HDFS or it might be on something like an S3. But that is the place that we are gearing data lakes. And without getting too deep into data lakes, that was the topic of, I believe, my February Advanced Analytics Talk, if you wanted to go back and find out more about data lakes. But anyway, it's a place for a lot of data. It's a place for all your data, not necessarily all data that's going on to the data warehouse. But it's the place for your data scientists to get in there and apply the algorithm, such as what we're going to show you today, as I mentioned before. The data lake is very important to artificial intelligence. So if you've come to this thinking, William is going to do a big reveal about some new fangled data store that artificial intelligence needs, in addition to the data lake, the data warehouse, the MDM hub, all our data march, streaming data, graph data, et cetera. No. What I've seen happening out there, the best that I can think about it is to have a great modern data infrastructure with all those things and to have them at a certain level of maturity. No one size fits all when it comes to artificial intelligence. It's going to gobble, devour, conquer, whatever the word is, data from a lot of different places. You're going to have to be great about your data in order to effectively get into artificial intelligence. So back to the bullets, cloud storage for your data lake. Database management systems. Can't emphasize enough, and maybe I'll get a chance in this series, to really hammer home the point that database management systems haven't gone away. They are to be highly considered in all new workloads. As a matter of fact, the category of database management systems that we really buy into and enjoy and think, get the default pole position, if you will, is the cloud analytic database. Obviously for analytic workloads, but a lot of things are analytic workloads today. So you know all of them. I'm not going to rattle them off. Cloud analytic databases for the majority of workloads. HDFS, more leaning here again towards the cloud, but there's a lot of Hadoop on-site, on-prem. Of course, the majority is, but that's optimized for your sequential reads and writes. Is this still a place? Is this still a place for HDFS in environments? Unstructured data stores, if you have a hyper important and voluminous workload of unstructured data, such as video, such as text, such as audio, such as a lot of different things that each industry has particular to it that falls into the category of unstructured. You might need an unstructured data store, like a Splunk, like one of those in your environment. So that's an optional part of the reference architecture of today for a data environment. Again, of course, you've got your cloud storage and your DBMS and your HDFS, but you may have a need for one of those. Okay. And finally, just text-based serialized data, CSV and JSON for interoperability. There are obviously specialized data structures for, say, JSON data in the NoSQL category. A lot of databases, however, have great JSON capabilities where you can absorb that JSON data and treat it like regular good old columnar data. So depending upon the scale of the JSON, we're recommending one or the other there. Anyway, there's a pattern to all this AI, and this is becoming a little bit more well-worn. So it's well worth putting this on your wall, knowing it, thinking about it, and looking at it and seeing the importance of data in the pattern for artificial intelligence. Number one, hire and grow your data science. Okay. You got to have somebody there that knows what to do with this data from an AI perspective. Oh, yeah, you know how to do your BI, but AI is different. And by the way, by the way, I am not one to disparage someone who is a data analyst or whatnot who has been around from moving into the data science area and becoming a data scientist. I mean, I want to say it's not rocket science, but I'm almost contradicting myself there. It is data science, which is pretty much up there, but I'll give people a chance. Let's give people a chance to become that data scientist, which we desperately need. So the fake it till you make it, data scientist is okay in my book as long as you're moving it up. Uncouple AI from organizational constraints. Oh, yeah. Oh, yeah. Oh, yeah. How can you do this without that? I know when AI comes up when I'm working with my sponsor, the client or whatever, you know, we have to talk about this. We have to talk about, you know, maybe they're ready, but the rest of the organization isn't ready. What is the first thing a non-listener to this webcast, what is the first thing they think about when you say AI? They think about their job, right? They think about it automating jobs and so on. And surely there's some of that. I can't deny. And we could have a whole webinar on that and thoughts about that. But I haven't seen it yet. Take on jobs. And the purpose of it really is progress and can't be denied. And we want to bring people along with the process. So conform the organization to the need, the downright need to do this, to stay successful. The ideation, now you're getting into ideas and I'm going to give you some today. Compiling your data is number four, internal and external. Labeling that data is number five. So you see two of the steps have everything to do with data. Building your model, prototyping, iterating. I like to say quote, unquote, running contests internally to see which algorithm, which model, if you will, is going to work best. I'll get into some of the best ones here in a minute. Productionalizing, because until you productionalize, you have not done anything worthwhile. It has to get into production and get into use in order to move corporate needles. So not good enough to have it sit in, test and or QA for a year. That's too long. We need to be agile about this. And then we have to think about scale too. What we put in production must scale. Sometimes we put something in production. We have to continue and tune it to make sure it's going to scale ahead of the need. Because in my experience, these projects either take off and go gangbusters faster than anybody anticipated or they flat line right away and eventually fail. And it's never, we never sort of hit it, right Johnny on the spot. So let's keep that in mind. We might have success on our hands and build a scale. Algorithms, there's so many. There's so many. But these are three of the favorites of mine and what I've seen work. And actually all the examples, and I have more to come, all the examples in this presentation use variants of these. So if you know no other algorithms, think about these. Naive Bay is a family of algorithms that takes a look at all the features of something and determines the value of that feature in terms of, I'm trying to say this right, corresponding it to what it is trying to compare it to. I probably didn't say that well. But anyway, for example, maybe that's a better way to say it. If it's round, if it's orange in color, and it's between three and five inches in diameter, it might be an orange. It very well might be an orange. And if it's pretty high above a threshold on those three features, the AI is going to call it an orange if that's what you've told it that the combination of those features is. Okay, ordinary least squares regression puts an emphasis on the differences in features that really magnifies those differences by minimizing the sum of the squares of the difference. So that's how much it has determined the importance of conforming. It actually puts a square root calculation on differences between observed and predicted values. And again, trying to identify things. And finally, logistic regression predicting the probabilities of the different possible outcomes. And you can go from there. Try multiple, I always say. Run contests, I always say. Now, here's some more examples. Marketing, let's go down the left side. Marketing, segmentation, analysis, campaign effectiveness, cyber security. These are some of the top uses of AI today. Hopefully you find yourself more or less in one of these categories because then you get to do AI. And that's a great thing. Cyber security, smart cities, retail and manufacturing for the supply chain, supply flow, customer flow as well. Oral and gas determine drilling patterns ensuring maximum utilization of the assets. I've been working with one Oral and Gas that is monitoring the flow of the gas in the pipeline to determine the characteristics and smartly routing the gas to the places that need it based upon whatever the orders are that have come in. And so where you might start, let's say in Florida, well, that's not a good example. Let's say in North Dakota, you might start the pipeline for some gas destined for say Tennessee, but you might have a more urgent requirement along the way for say Chicago. So that can be rerouted or you might find that you have some other gas piping towards Tennessee from a closer distance and it's the same grade that we need there and so on. So it gets pretty complicated, but it's fun optimizing this stuff. Life Sciences, studying the human genome and wow, is that ever going to be a huge area in our near future there? Now on the right side, I have put some of the top enterprise data domains. Now, most of this is more or less if you're familiar with the dimensional model, it's dimensional type data, but it's not meant to be exclusive to that. So sales, bill of materials, financials, stuff like that. Some of it is more or less transactional data, but regardless of where we want to, how we want to label it, if you take the things on the left, the business use cases, you can cross-reference that to a lot of the data domains on the right. And each one's going to be a little different. I'm not going to take the time to do it now. You should do this exercise for your top AI business use cases that are coming up. So that helps inform you of your roadmap around getting these enterprise data domain, what again, I'm going to say it under management. And that means to a certain level of curation, to a certain level of maturity. And a lot of times to me, that means you have it mastered in master data management or in a data warehouse or data lake, some leverageable data store. Everybody's a little bit different, but these are real leverageable data stores. So I would spend 80% of my time there as opposed to other things. I would even think about re-engineering some other things to meet and conform to great practices around those things. Data warehouses, data lakes, MVM hubs, operational hubs, should add that too. And there's some others, but really it's pretty finite. And so you want to have this data in there. Now let me give you some examples here. If you're building a campus, if you're building a parking area, let's say, or even if you're putting up a new building, maybe even your engineering on the interior of the building, doesn't matter. Temperature management is important to all of that. And temperature management, obviously the temperature can be read out over time with readers. And did you know urbanized environments are three to four degrees hotter than the surrounding non-urbanized areas? So what about the heat? Well, as we get hotter, this becomes more important. So what we want to do is cross-check people patterns on top of temperature, on top of areas that we're measuring the temperature on to determine what areas we might need to optimize the heat around that's on the outside. And so different cities in Australia are doing this effectively. There are options. There's planting trees. There's different surface materials for roads and pavements that you can have. There's green roofs and green walls for more or less interior. There's the density and the color of building materials. There's a lot of things you can do. And really, you want to do this if you're planning something new, if you're planning a new campus, planning a new building, et cetera, et cetera. But there's ups and downs, pros and cons, I guess you might say, with each of these. For example, planting trees might seem like, yeah, let's do a bunch of that. Well, if you plant them too densely, some people are going to feel less safe walking around in those areas because of the cover that those trees may give. And so you have to take that into consideration. Maybe look a little bit at the makeup of the people that are walking about the times of day. There are various things you can do to not only have great temperature management as great as it can get. I'm not talking, by the way, about affecting the overall temperature of the planet or anything like that. But in what you can control, the data that goes into this, obviously it's going to be great thermal data that you're going to get on Redoubt. It's going to be that video data, and I'm going to come to that in a minute. You're going to be looking at those people patterns, so your AI is going to have to be able to pick out the people patterns and so on. And it's all the data about your options. And you should be accumulating data as you devour this data or as AI devours this data, should be accumulating information on the various options that are available and doing the right thing. Next is, this is not something that is an application necessarily, but it applies to many different applications, and that's video. And a lot of this is about surveillance, not just of people in cars, but of whatever's going on. And feature coding, which involves both feature extraction and a compression process, that you're applying to videos. For whatever purpose you have. Now, there has emerged core techniques in visual systems, and these are what you see here. Feature generation, feature generalization, feature redundancy removal, rate distortion optimization, feature binarization, and network compression. Think about, for example, self-driving cars and all the information that it needs to process in order to do an effective job. It needs to be able to identify, obviously, people, but street signs, road signs, animals, potholes, etc., etc. So it needs to be able to distinguish those things, those features, but also know what to do about it. So video has become, now it's become paramount whenever you're doing any kind of surveillance. Now, if you have a campus and you want to know who's moving about on that campus, sometimes companies can't identify everyone by name, especially visitors and so on, but you can search to see whether a given person is present in multiple cameras and get kind of a sense of their movement pattern. So what data here is important, the video data, obviously. That's what it's all about here. Now, lastly, I'm going to give you an example here of another type of data that supplies, again, to many different types of applications. And satellite data or aerial data that sort of simulates satellite data because it's up so high. We see cameras in our streets and so on that are up pretty high, scanning down. That's what I'm talking about. A lot of parking lots are scanned this way. If you have a receiving center, like a dock kind of thing, you want to know the pattern of traffic into your receiving center well before it gets right to the dock. And then it's all clogged up and so forth. So you know what to do before it quite gets to the point at which you need to unload that truck or whatever it is. So this applies to things like commercial, obviously, national security. There are humanitarian reasons. There's also, you know, we want to know the buildup of war material, for example. Now, there is a public data set that if you're interested in this type of data, it's called YOLT2. And with that data set, there is data there from parking lots in Toronto, New Zealand, Germany, Columbus, and Utah of all places. Not that there's anything wrong with Utah, but just the whole state of Utah has participated in this. And so all these are public data sets that you can use to train AI. Data is collected via aerial platforms, but at a NADAR view angle, such that it resembles satellite imagery. And with that, you can train your AI to let you know what the vehicle movement pattern is in the areas that you are surveilling. And accuracies are pretty high, 97% in the application of this. In urban scenes with dense car counts that resemble the training scenes. So that's one thing that's pretty key is you want to have, you want to resemble your data that you have trained the algorithms on in your real world. And that makes it all better. So finally, it's not all about data. There are some other things, and I want to fill this out for you, some other corporate requirements. Now earlier I gave you the pattern of AI. But these are some of the requirements you have to have to go through that pattern in your organization. Some of the skills of today, and we can debate this, some people would go different ways with some of this, but I think this is becoming a pretty good standard. The split of the necessary AI and ML between the edge of corporate users and the software itself is still to be determined. So you still have that, you still have edge computing that can do a lot of AI today, but limited. And so you have to take that into consideration as well in the design of your application. But good old math skills, I knew of no better word to use here as I was thinking about this than math, because we need floating-point arithmetic, deep statistics, and linear algebra in our data scientists for them to be highly effective. So some things to bone up on maybe for anybody aspiring here is your math. The use of GPUs to handle the processing load of AI and of the future. And we now see different databases that are based upon GPUs. And I'm sure that many of the leading databases have plans in this area. I don't know of them in detail, but I do know it takes some engineering effort to make things work with GPUs. And I don't know where everybody is on this, but there are some already out there with high performance, obviously, that are working with GPUs today. Python. Yeah, easy to program and it good enough. I think I probably should have said it's good enough. It's good enough for a lot of things. And it's nice to have a nice hammer in your tool belt to pull out that does a lot of different things. And there's no different libraries available. TensorFlow. Yes, it's becoming pretty much a standard library of algorithms as a computational symbolic graph to Python. R and MATLAB, other programming languages, optimized for math with features such as direct slice and dice of matrices and rich libraries to draw from. And some people are going, well, what about this or that other library? Okay, okay. Yeah, there's more. There's more. But you don't want to be out there trying to necessarily design your AI from scratch when there's a lot of libraries available. Java and Scala, yeah, still around. Works well with Hadoop and Spark, respectively. We're still kind of believers in Spark here. What else to say about this? Yeah, these are languages that work on the data infrastructure that I talked about before. So, yeah, the BI, that's all still there. The visualization, yeah, that's all still there. But that's not going to take you to the AI promise land. It's going to be these things. You're not going to, for example, tableau your way to AI, okay? You need programming today and you need these types of corporate abilities. So data is the foundation of AI. For example, I've given you several examples here today. Self-driving cars will be a reality. Robots are going to start taking care of routine tasks in healthcare and in other industries. So staff can do more. Geological research, thermal data, video data, people data. And finally, an example from Life Sciences. They are going to be able to discover and deliver revolutionary therapies faster with the use of AI. It will help them discover a range of new therapies and rapidly recruit and retain patients in clinical studies. So from a data management perspective, AI can automatically discover and catalog all types of relevant data. It's important there as well as in the applications that we're delivering to market. So that's what I have to say today. This is a developing story, really, about data curation for artificial intelligence strategies. I don't know that anybody has it all figured out, but I do know that we have to be moving today. And I do believe if you're moving in this direction, you're not going to be veering too far off course of where you may need to be down the road. And so I encourage everyone to move in this direction. And I'm going to turn it back now to Shannon in case we have any questions. Shannon? William, thank you so much for another fantastic presentation. Just a quick reminder to answer the most commonly asked questions. I will be sending a follow-up email for this webinar by end of day Monday with links to the slides and links to the recording of this session. And if you've got questions, feel free to submit them in the bottom right hand corner of your screen in the Q&A section. So William, what's your opinion of automated machine learning offered by some products? Well, I think, I mean, I'm all for it. And different, I'm familiar with a few of the automated machine learning abilities out there within database management systems. And I have used them. And I think there are some selective edge cases right now where they make sense, but it doesn't absolve the need for all the things that I talked about here today. But without a specific example to address, I would just say generally that they should be in the mix of potential AI solutions. They should be brought into your contests that you run, again, trying to get the best solution. Unfortunately, today we're playing a bit of a guessing game with our algorithms, and that's why we still have to run the contests. I'm sure that there will come a day when AI will help determine the AI. Of the future applications that are based on AI. I don't know how far we go with this. But anyway, for today, I mean, I'm definitely a proponent of it and would consider it right alongside everything else. How do you ensure bias-free algorithms? Say it again, please. How do you ensure bias-free algorithms? Is it possible? Yeah. Well, it has a lot to do with the data. We've learned that. It has a lot to do with making sure that you have a great stratification of the training data. What can happen, what I've seen happen is that, well, the data is like 80-20. It's 80% your mainline data that you almost don't need AI for it. But the 20% of data is for your edge cases, whereas where you're going to make your money. And in those 20%, though, you have 20% of the data. In other words, you don't have a lot of data. So a lot of companies have turned to embellishing upon that data and trying to make more of it so that they can get more out of the algorithms. And then suddenly, the 20% of training data becomes 50% of the training data. And the algorithms begin to think that that's a normal stratification of the data and anything. Other than that is going to be problematic. And so you have to, you have to build out, I would say build out your, build out all data, synchronous, it's asynchronous. So that you are well aware of your, you're putting your thoughts behind the algorithm and you're looking at it through the algorithm's eyes and you're telling that algorithm what your world is like. It's okay to have a lot of highly nuanced oddball use cases out there. Because again, that's where you're probably going to get the most bang for the buck. But you have to make sure that the overall volume of data, so not only the quality of the data, but the quantity of the data is representative of your business environment. So from a data perspective, that's the best thing, I think, to make sure that the algorithm's become bias-free. And I don't, and by the way, if the questioner is getting at other kinds of bias in AI like it might, wasn't there a Twitter bot that was created that started cursing and becoming full of characteristics that we would rather not have in society and so forth? I don't know how to stop that, because it's more or less a reflective of reality. I mean, you have to take the output of the algorithm, and then you have to look at it from those lenses of bias. And there are different norms that obviously we want our business to lock into that's not all about whatever you may have given the algorithm to begin with in terms of like profit. Okay, so another way to make sure that there's less bias is to build those norms into the algorithms to begin with so that you, what you get coming out of the algorithm is going to be reflective of your values and something you'll be able to act upon straight away versus what most people are doing today is, it's hard to get to that level of detail, so I'm not denigrating it, but what you get as output, it may be full of, it's obviously going to be geared towards whatever input you gave it. So if you said maximize my profit, well, that's the data that's going to do it. So then you have to apply other lenses to it, and you can't forget to do that. So I guess that's the way to make sure that it's bias free. I love it, and in your response on bias, you kind of led, started leading into this next question, which is, would you consider the social and social ethics and business culture while recommending the adoption implementation of AI? Oh, wow. Yeah, that's huge. That's huge. I mean, we could have a whole session on that, but like I said, I haven't seen it erase a lot of jobs today, but my personal outlook for the future is that it will do quite a bit of that, and it will affect every job out there, just about. And then we get into the future future, and 50, 100 years down the road, what's it all going to mean? Are we going to be able to sustain the AI that we're starting to put in place today? I'm not worried about AI taking it over and deciding that we're useless and blowing us all away or anything like that, but it's going to change society. It really will, and there are some definite good things about it, such as in the medical field, where we're going to be able to, like I just ended the session with, able to access treatments more rapidly and with more accuracy and be able to improve the human condition. Now, will that be able to be spread to all humans? I doubt it at the beginning, so it kind of depends on the way of the world. Yeah, well, I don't think it's a job replacement, but like the social ethics and business culture of AI learning to swear or should it swear. An example of data ethics. If data ethics is becoming a hot topic, we've seen an example of AI and should, if it's beyond freezing out there, should a smart building keep doors locked and people out from seeking shelter, right? So what level, when considering, so do you consider those social and ethics and business culture rules while recommending the adoption and implementation of AI, or do you just start with fundamental data rules? I mean, I think you got to get started and you got to get started as soon as possible with an agile spike in terms of what you can do, but you have to understand what you're doing. And when, like I just said into the prior answer, when the output comes out, you have to understand what it's really saying and if that's something that you want to act on. So there may be some iterations of this that you have to go through before you have something that you feel good about, you know, go into production with that meets all the criteria. Fortunately, there's a lot of AI ethics startups. There are companies out there trying to develop standards to put the guardrails up and to help companies because like you alluded to, Shannon, you know, we're going to put our heads down and we're going to go forward. And what are the ethical implications of this? And what about the bias that can creep in? So that almost, I think, has to start happening on the outside in an entrepreneurial fashion and allow for some of that to seep into organizations that are bigger in doing this and they have to be receptive to it. And what I've seen, they are and they know that they're stepping into a minefield here. They know they have to. They know it's going to be profitable, but they also seem to want to, you know, adhere to norms and be bias-free and understand the downstream implications of it as best as they can. And some third parties are stepping into that phrase. So I welcome that. I love it. That just gets my geek inner geek going. Me too. So what is the best way to instill the confidence among business users about the use of AI, for example, removal of biases from the outcome? Well, I think the internal users of AI, if you will, are less concerned about the bias that it may impose upon the marketplace, except, you know, in terms of them as consumers and more concerned about the changing of their jobs. And for this, we implement a program of organizational change management and we encourage the uptake of these programs. In other words, you do not roll out. Actually, let's back up. You don't roll out analytics programs. You don't roll out master data management that's changing workflow. You don't even roll out necessarily big data science programs without organizational change management today. You really shouldn't, because you are changing roles and people are getting kind of blown about in the wind out there right now with the rapid changes in technology that many people have never seen before in their life. And what can I say, one of the definite characteristics of being able to succeed in the world today is being adaptable, being flexible, being progressive about change. And that's not going to change. But what can change is creating that awareness inside of our organizations of these facts and letting people know in advance, you know, we're going to be changing. Here's why. And here's what it means for you. And in many cases, if you think it through, jobs will be upsized and people's abilities will be enhanced rather than on the decline necessarily. So nothing's going to change that people have to change. People have to change on their own, but there's a lot that companies can do in terms of organizational change management to make sure people are given as much opportunity as, you know, can possibly be afforded in this situation. And I love it too. So what would we follow AI language, MI, BI to AI, for example? I am not needed to throw all those a comment on your last answer. If you have any additional questions, feel free to submit it in the bottom right-hand corner of your screen there. Well, so, and then let me go back to this, and let me reread this question. What would follow artificial intelligence like MI and then BI and then AI? Well, a lot of I's there. Can you come back? I don't see the question. So can you come back with that a little bit? Yeah, it's in the chat section versus the Q&A of it. So what would follow? Would it be like a machine learning, machine intelligence, business intelligence, artificial intelligence? So going back to the I am not needed in this world, you know, quote unquote. Let me see if I can see that. What would follow AI? Okay. I am not needed in this world. Oh, boy. Again, okay. So this is a big question. This is a bigger question than data curation for artificial intelligence strategies, right? This is about the future of humanity and how far will AI go in terms of that? How far will AI go in terms of changing how we fundamentally are? And if you look back to history, we haven't stopped ourselves in terms of change and it can be, you know, I think we're living, we always seem to be living in the best of times, but you know, there's always steps backwards on that journey. And if we don't do this right, we're going to do it, by the way. But if we don't do this right, there's going to be a step backwards for a lot of people. And I don't know where that's going to go. I don't have my Nostradamus hat on right now, but it could be pretty painful, you know, for a lot of humanity. If we don't have things for people to do when they don't have to work 40 hours a week, and there's a lot of, you know, speculation about, you know, wages being kind of free wage, kind of saying a minimum wage for not working. There's a lot of speculation about virtual worlds that will still stimulate people that they can go into, you know, in lieu of having to do mundane tasks. But that's sort of at the, to me, I think that's kind of at the high end of the possibilities, at least in the short term. So this is something that we'll have to wait and see play out. I love it. So that brings us pretty close to the top of the hour here. William, thank you again so much for another fantastic presentation and just a reminder to everybody, I will send a follow-up email by end of day Monday with links to the slides and links to the recording of this session. Also, if you'd like to continue the chat that's been going on with each other or if you'd like to follow William, you can go to community.dativersity.net. Thanks, everybody. I hope you all have a great day and we hope to see you next month. Thanks, William. Thanks all. Thank you all. Thanks, Shannon.