 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We would like to thank you for joining the latest installment of the monthly Data Diversity webinar series, Advanced Analytics with William Pignite, sponsored today by Chaos Search. Today William will be discussing 2022 trends in enterprise analytics. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions we will be collecting them by the Q&A section or if you'd like to tweak, we encourage you to share highlights or questions by Twitter using hashtag ADV analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To open the Q&A panel or the chat panel, you will find those icons in the bottom middle of your screen for those features. And just to note, the chat defaults ascended just the panelists, but you may absolutely change to chat with everyone to enable chat and networking with each other. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me turn it over to Courtney from Chaos Search for a brief word from our sponsor, Courtney. Hello and welcome. Thanks, Shannon. Nice to speak with you. It's a pleasure to be on today with both you and also with William and for everybody who's taken time to participate in the session. Chaos Search is a sponsor of today's program, so I'm just going to spend a few minutes at the top of this discussion before we get into the analytics trends that I'm sure many, many people here are tuned in to hear about, to just talk a little bit about how we at Chaos Search are trying to solve for those same trends and make analytics something that are totally accessible for the enterprise in 2022. So with that, hopefully my screen share is coming through, but who is Chaos Search and what do we do? We help modern organizations know better by activating the Data Lake for Analytics. I'm sure the Data Lake, which is a topic I know William will cover today on the call, is something that many people here have positive, negative, varying experiences with, and we are very, very focused on helping the modern enterprise activate the Data Lake. And then how do we do that? We have a Data Lake platform that indexes our customer's cloud data, making it fully searchable and enabling analytics at scale and result being huge reductions in time, cost, and complexity. And over the course of the next few minutes, I'll help this crew kind of see how that actually works for some of our customers. I'm sure this chart is something that many people look at and go, yes, this is true of my existing environment, but what is it? The data analytics challenge that we know organizations are facing, data-driven organizations are facing today, is that there is a tremendous demand on IT, infrastructure, DevOps teams to make data of all types accessible to users for purposes of analytics all the time. The promise has been that, oh my goodness, can we put all of that in a single repository and manage the data growth and get better insights? That's true. That's the promise. But the reality, and I think anybody here who's dealt with a more traditional Data Lake has seen that what actually happens, right? You have silos of information based on the sorts of users that need to access that information, based on the type of the information, based on the systems that the data is coming from, and the work, the pipelining, the migration, the toil as we call it at Chaos Search that ensues to make that data accessible, ends up precluding both the users and the organizations as a whole from really being able to tap into and derive insight from this volume and variety of data. What's the end result, right? A big gap in access as well as time to insight. At Chaos Search, we ask this question, what if you could analyze any and all of your data? I mean, is that not nirvana, right? When William talks about the trends and advanced analytics, they may be all focused on doing this, right? What if you could analyze any and all of your data? How? In an automated way and at massive scale and reducing the time to insight and also your cost by up to 80% without changing the tools and platforms that your users use to derive answers. You would know better, right? You would have insights at scale. You would have immediate time to insight. Your resources would be easier to manage and you can truly see the world the way you want it. At Chaos Search, quite simply, and we'll go through this in kind of rapid fashion, what are we actually doing for customers? For organizations who have placed their data in S3, Cloud Object Storage or Google Cloud Storage, we are operating right on top of S3, the data platform, enter the Chaos Search data platform, and we are allowing organizations to access that data directly from S3 for purposes of analytics at scale with no movement, no transformation, no work. So if you are and you are the end user, let's just say we have a lot of customers today who have chosen Chaos Search for serving their log analytics needs. Let's say you're using elastic search at scale, but you are constrained when it comes to retention. You are able to only look at a small subset of your data for purposes of, I don't know, containers analysis or understanding traffic on your website or any sort of heavy log data analytics that become so, so imperative for organizations who are heavily deployed in the cloud. With Chaos Search, introduce Chaos Search, sits right on top of your S3, you access Chaos Search, we directly connect through an elastic search, an elastic API, and DevOps, SecOps users are able to access their information at scale with no change to how they do their jobs every day. And what does it mean, right? You actually are able to benefit from one unified data lake. And at Chaos Search, not only do we do that for logs workloads, but we do that for SQL workloads. We do that, we will do that for machine learning workloads this year. So it is a single unified representation of your data that is pulling right from your object storage. You don't need to copy that, you don't need to store it elsewhere. Chaos Search doesn't own that data. We are virtualizing that data and representing it to the users in a way that really allows them to do more, more quickly. How do you do it? It's very, very simple. Insights at scale. Store, connect, analyze. We tell all of our customers the same thing. Store your data in S3. Many people have S3 or GCP. That's your baseline. If you are in S3 and you have your data or in Google Cloud Storage, you can connect to Chaos Search in less than five minutes. You connect it to your bucket, you click to index, you create a review. And then what is step three? Using whatever tool of choice, let's think about Kibana, for example, and being able to come in through that tool as an analytics offering, you can analyze on demand. So if you are sitting in Cloud Object Storage in either Google or S3, you can activate Chaos Search in five minutes or less. You could do that on your website if you choose to do that. But the point is really that it's meant to be very simple. But what does it mean from a benefits perspective? What we really care a lot about and what this question has been about the data lake for years now is the concept of a data lake is so everybody gets it. Of course, you want to place your data into a lake where multiple users can access it, but the reality has been hard. If you look at a traditional log scenario where you have multiple clusters of elastic search, which we're showing here on the left, you're limited in terms of retention. You're siloed in terms of representation of that data. You have downtime when you're thinking about how you can access any of this information and move it around at scale. And the silos themselves preclude you from having a singular view. It is as simple as what is shown on the right side here, which is with Chaos Search, you're getting rid of all of that complexity and you are actually able to benefit from a singular data lake where both scale and retention are unlimited. So clearly in that scenario, you can see how getting rid of a lot of that work that really happens today and also making this a much more streamlined experience for all of your users allows you to both save time and money. One of our customers, Blackboard, some people on this call may be very familiar with Blackboard, but they are either an online educator. And when COVID happened two years ago, what happened, the demands on their service just expanded dramatically and they needed something else to handle their logs. And when Joel talks about this, as he has discussed in webcasts and what you'll see here, the amount of time to manage the logs that were now everyday occurrences for their business for them to see and really understand what was happening across all of their product lines became impossible. And with Chaos Search, they are able to get that singular unified view of their logs without the work. So they're seeing more, they're getting answers more quickly, and the management has really, really gone away. And with that, I hope, oh my goodness, it's almost nine past. I would be happy to answer any questions at the conclusion of the call today, but we are incredibly excited about what William's going to share relative to trends and topics that are really relevant to this audience for purposes of advanced analytics because at Chaos Search, we are passionate about helping make analytics of all types at scale accessible to end users and really simplifying that experience. So with that, I'll turn it over to you, Shan, to turn it over to William. Courtney, thank you so much for kicking us off with this great presentation. And thanks to Chaos Search for sponsoring and help to make these webinars happen. And if you have any questions for Courtney, feel free to submit those questions in the Q&A section of your screen, as she will be joining us in the Q&A at the end of the webinar. And now let me introduce to you our speaker for the series, William McKnight. William has advised many of the world's best known organizations. His strategies form the information management plan for leading companies in numerous industries. He has a prolific author and popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database data lakes, streaming and data integration products. And with that, I would give the floor to William to get his presentation started. Hello and welcome. Hello. And thank you, Shannon. Thank you, Courtney. Great hearing all about Chaos Search. And I think it does play into some of our trends here. It's time for the trend. So welcome to 2022. And I trust that the right slide is presenting right now. If not, just let me know. I can push the magic swap displays button if need be. But because you never know about that. But anyway, welcome to 2022. I'm excited. I hope you're excited as well that you've chosen wisely in your career to be here, to be here in data in some way, shape, or form and to know that it's growing and it's doing more innovative things. And as a matter of fact, in putting this presentation together this year and last year, I kind of noted to myself how even more progressive it is this year in terms of the things that are happening. In terms of the things that we need to keep up with. So in the spirit of keeping up and growing our enterprises appropriately, let's dive in here. I've been introduced. We have strategy, implementation and training options over here at McKnight Consulting Group. So I'm going to pull you in just a little bit more. Hope you don't mind. But why are trends important? Why should you pay attention to trends? I think it's imperative to see trends. They affect your business. Your users will eventually, if not real soon, care about these things in your offerings, in your services, in your products, in their products, in their services. And we're trying to help them, of course. So it's about planning and dealing with change, picking your winners. You know, I have my trends here. Others have their trends. My perspective is, I think most of you know, my perspective is data. My perspective is enterprise data now for 25 plus years. And it's about making that data into a significant corporate asset for you. So I'm looking at the trends that help that goal. And I want you to be a leader, not a follower. It's very important that organizations have data leaders and understand that data is important and how important it is and can continually articulate that to everybody that needs to hear it within the organization. And that's pretty much everybody. I want you to, I want to grow your business ideas. I believe and I've said this many times that the great business ideas, the great initiatives are going to come from the perspective of data as we go forward and the perspective of analytics. And it's people like you and I that understand this and can be significant movers and shakers inside of companies as a result. So I talked about you want, I want you to be a leader. Leaders of tomorrow can advance maturity while also solving business issues. So pick and choose over the trends as you need to. And they're not all going to apply to everybody, of course, but you want to start working them into your initiatives. Everybody out there, every organization I should say is looking over initiatives now. Sometimes it's for what they're going to do in 2022. Sometimes it's multi-year initiatives, but either way, those initiatives are going to need to be architected. And the default easy thing to do or seemingly easy thing to do is to do it the way that we've always done it and forget about trends, forget about these new possibilities and just do it the way we've always done it. Nobody will know that we didn't move forward with things. Well, I think eventually if you've been around a while, you know that that does eventually get smoked out and the organization will like people that actually do think ahead and architect for efficiency, architect for functionality. And those are a couple of things that you really get from trends. And that's very true with this year's trends. There's a lot of efficiency trends in there. And there's a lot of new capability trends that are in there. And every organization is going to need to call them over and get into their winning approaches. And don't worry if you're not 100% right about your winning approaches. Don't worry if you picked Hadoop for your day to warehouse a couple of years ago. And now you're kind of kind of stuck. Well, you think you're stuck. You're not really there's always a way out. It's just going to be inefficient to pick winner pick non winners here. So I'm going to give you my best shot at it. I want you to think about it as well. By the way, there's no budget for staying on trends. So if you're waiting for that, that's not going to happen. And by the way, you also don't get a whole lot of credit for staying on trends, right? In an enterprise, we get credit for meeting enterprise goals. Okay, quickly, last year's trends. Here they were. And I admit as I look over these, I didn't, I didn't phrase them all in terms of a good, a good, you know, letter grade applicable kind of trend, right? Like remote work continues. That was one of the easier trends, right? So let me just go through them quickly. See how I did. Some of these are going to carry forward in some way shape or form into 2022. Remote work certainly led by cloud capabilities. There was a strong tech rebound so far so good. Leading organizations are increasing a focus on AI and ML. Yes, absolutely. Model deployment takes center stage in those organizations. I believe that it did more edge AI. Yeah, more is kind of a fungible word, right? But I do think there was more. And I think we're in the process of seeing a lot more. So that's going to be, I think the first trend I'm going to share with you. So I'm going to revive, revise that one a little bit. Explainable AI. That didn't really, that didn't really go anywhere. I was thinking that some of these algorithm providers would be helping us understand what went into it. Not to a deep degree, but to the degree necessary to meet some of these requirements that are coming from outside entities in terms of being able to explain the AI that we're using inside the enterprise. So no bueno on that one. Strong Data Lake adoption. Yes. Oh, yes. Strong Data Lake adoption. Everybody wants a data lake. Sometimes they think they, they think they want a data lake and they really want a data warehouse. It's on S3. That's okay. That's okay. That's a data lake. That's okay. That's a type of data lake. New technology stacks shifting from only data warehouses, lakes and ETL to data fabrics, AI and pipelines. I'd say maybe a V-grade on that one because I think for new technology stacks that is true, but I don't see a lot of people going backwards and redoing their data warehouse ETL. It's full steam ahead and it's full steam ahead into pipelines and so on. I don't see a lot of, let's do this with ETL unless it's a kind of small departmental kind of thing. Strong DevOps adoption. Yes, there has been strong DevOps adoption at least in terms of people saying that they have strong DevOps and are using some of the science of DevOps, some of the, some of the things that are coming out of that domain, moving that into the organization. MLOps, not as strong, not as strong. We're still probably not there yet. Automation, certainly as a top AI mover and shaker at least at the beginning of your AI journey. Yes, it is that open source adoption. I'd say that was sort of flat, still strong, still strong, but it didn't really tick up any as far as I could tell. Kubernetes adoption, that's an easy one. Yes, that happened and we were at the start of general AI. That's just an opinion, not really something to be graded, but I believe that is true and I'll probably drop a few points about that into this year's trends. Speaking of this year's trends, let's get into them. The first one is I promised was about edge AI. Edge AI and edge computing will dominate architectures. Embedded databases have become a popular use of database technologies. Aggregating all the uses of databases would show that embedded databases are just as popular to the more extrinsic approach with all the connected devices that we have out there now projected to be 75 billion by 2025. What's happening is enterprise embedded databases can be found, for example, in mobile airline applications for online check-in, boarding pass retrieval, flight size checking and so on. It's all being empowered by improvement in chips at the edge. Improvement in chips means we don't have to just put data in files at the edge, just a little bit of data and a little bit of processing. We can put full databases, we can put graph databases potentially at the edge, and we can put not just light processing, but we can put artificial intelligence processing right there at the edge using some of the chips that are coming out from companies like SambaNova and Graphcore and so on. If you're in an edge computing system, the data and applications are brought close to the point where it's needed. However, rather than connecting that distributed system to an on-prem environment, it is often most pragmatic to connect it to a cloud infrastructure. All these distributed sites can be linked together with an enterprise computing environment. I think this is a strong trend for the year. Look forward in your architectures. As I mentioned, look for graph-shaped data to be out there at the edge of your architectures. All right, data scientists. Data scientists start doing more data science than data cultivation. There's a few reasons for this. I see this turning in this year. One reason is because our data environments are just that much more mature. A year more mature, we've largely gotten past our phase one data lakes. They're now up and running, meeting business goals, et cetera. Of course, we have our legacy everything that's still applicable and working like our data warehouses and our CRM and so on. The data scientists are starting to get more tools, and that means the data science is going to go up, because it used to be, of course, we all know that half of a data scientist's job was cleaning data, integrating disparate storage systems and finding the needed storage capability, and processing power to put AI and machine learning models into production. But in 2022, we expect some relief, because for one, the environments are more mature, as I mentioned. But also, more machine learning tasks are becoming automated. It's not becoming such hardcore hammer to the chisel kind of thing, where there's so much detail work to get to something that actually makes sense in an organization. I believe that this is where different tools are going to shine. I think Chaos Search is in here as well with their ability to index data lakes and so on. Something to look forward to. Data scientists are actually doing more data science in 2022. Now here's an easy one, I think. Wide adoption of containerized data. So we expect to see more Kubernetes-ready distributed database platforms that have addressed the challenge of stateful persistence. That's the issue that's been holding back containerization. It's that stateful applications need persistent storage, have been leading a legacy infrastructure in production, and I believe that this will abate in 2022. For example, Cockroach Labs offers this capability through a distributed SQL database solution architecture. This functions like a single logical databases. I also see advances in security around containerized data. All of these things are pointing up for all development in many shops, at least. It's going to be with containerized data. Also, it's incredible how many tools, and they're mostly open source that there are to empower Kubernetes developers and operators with powerful container orchestration. So this is becoming a little less big hammer and chisel kind of approach as well. So we're seeing some support there for building containerized applications, containerized data, and so on. So I expect to see that development is going to be containerized and it's going to be containerized largely with Kubernetes to a large extent. This one's obvious, right? It's giving rise to something less. So a tendency to spin up clusters, be they for big data, data warehousing and machine learning on a task-based basis. So all this quickly getting clusters up and running for something, testing something out, shutting them down, and so on. That's happening more and more. Call it serverlessness. The architecture is the very substance of the advanced, for example, cloud-era data platform. And it's also being leveraged by Google for Spark on Kubernetes on their Dataproc. Ultimately, it's enabling new workloads and it's making us more efficient. And it's making the architectures more efficient because you can port your development to new clouds or wherever you want to go in the future. Now, here's one that not everybody's even heard of, but I think it's going to take off in the New Year. Synthetic data. Synthetic data. We've been held back doing AI modeling because of the lack of data. And I believe going forward, synthetic data will be a requirement to build the enterprise. The enterprise cannot be built without this use of a lot of data for AI capabilities, which requires a tremendous amount of high-quality, labeled data. We all know this. They're connected to a high-degree AI and data, of course. And so, synthetic data they're starting to become a lot of startups that are emerging for structured data, unstructured data, etc. And this is all going to improve and enlarge data lakes out there. And this is where most of this data is going. So, for example, most self-driving cars are built with synthetic data produced in simulations. So, some of the startups, if you want to look them up, mostly AI, Y data. And there are a few others that I'm very hopeful about. Data fabric. I'm being careful with my words here. Data fabric is going to see an uptake. So, data fabric, let's start with that. It's a data management architecture that can optimize access to distributed data and intelligently curate and orchestrate it for self-service delivery. This is a strong trend. It's kind of like the old data virtualization, only instead of just database to database, it's architecture to architecture. And I think that we're seeing a lot of companies say, look, this is how we've grown up. We've grown up building silo architectures, but there's something we can do to make them work together at least to some level without bringing them all together in one big architecture, which we know will never happen. And that is data fabric. So, I think 2022 is going to see significant growth and interest in data fabric solutions. However, I also believe that it's going to become a pretty hollow term. I remember for quite a while, data warehouse was a pretty hollow term. Data Lake, it's just sort of coming out of that phase where it was a pretty hollow term. We could talk to people and talk about data warehousing or data lakes back in the day, and we could be on completely different planets in terms of what we were talking about, yet have a conversation and think we were meeting the minds. I think that's sort of what's happening with data fabric. It's something that all the vendors are glomming on to. A lot of people inside an enterprise are saying that's what they're doing, but they're not really moving the needle very much to make the goals of the data fabric happen, the orchestration, the access to all data, and so on. So, I know it's kind of hard to predict about something that's kind of nebulous like this, but true data fabric, I think it's going to see an uptake. Now, moving right along, AI-enabled applications. Not just AI tools, but AI applications. Businesses are going to expect vendors to deliver comprehensive AI-enabled solutions for line of business teams and departments instead of focusing on developer tools and technologies. Now, some people might say it was kind of early to see this industry making that turn, that change of direction, but I think it's time and I think we're going to start to see it. Start to see AI-enabled applications. So, these can impact all enterprises within an organization. So, remains to be seen where AI is going to focus on in terms of domain within an enterprise, but it will focus on automation and deep complex analysis of big data for immediate action. So, we're going to see organizations shifting their investments away from AI, moving into customized solutions that are built with AI. Data catalogs. Data catalogs are going to cross the chasm in the data stack. Well, what chasm am I talking about? I'm talking about the chasm of should we include them or should we not? Are they really part of the stack or are they not? I mean, we've got our ETL, we've got our data warehouse, we've got our BI tools. I mentioned before that the whole stack is changing and metamorphosizing into more pipelines and data catalogs are really important here. So, there's some players in the little box down there that I think are poised well for 2022 for the needs out there and I think data catalogs in general are starting to get accepted as part of the necessary stack for an enterprise for an important workload and you could also throw things like AWS glue in there as well. So, data catalogs. Data quality. Let me talk about data quality here for a minute. Data quality is probably as important as it ever was inside of organizations. Some organizations treat it with the importance that it deserves and some do not and I think you very clearly pay for that. I think I've had a talk in this series about data quality just like two talks ago or maybe it was last month. It's not important but there's more. Now that we're into pipelines, we're into data fabrics, some of us are doing data meshes. It's more complicated. We have data catalogs to try to track everything but what about taking action on what we're seeing out there inside of these new stacks and these new pipelines and that's data observability and I think that the people that are in data quality today are going to morph into people that are in data observability tomorrow. They're going to be responsible for data quality as well as the different things that we see here. Auto discovering and adapting data quality rules, proactive monitoring and anomaly detection, unified scoring and personal alerts, data reconciliation and masking, all these things that are done with data observability. Most likely with a tool of Calibra, in my opinion, position pretty well here for the future. So data observability, strong trend. Okay, here's another strong trend, streaming analytics. Streaming analytics in general, yes, but streaming analytics with IoT definite big upswing in that. So we've got our live device log data, it's going into Kafka, maybe a Pulsar, maybe a RabbitMQ type of thing and then it's going into your stream processing. A lot of us are using Spark but there's other alternatives there as well. Moving that into our data lake and eventually the data warehouse. I don't mean eventually like it's a long time. It's pretty rapid in terms of what needs to happen. The number of devices that are connected to the internet is growing at a rapid pace. This is known as IoT of course. IoT refers to all those physical objects or things that are connected and it's expected to grow exponentially this year and as more devices are connected to the internet, the volume of data generated will also increase and we absolutely need streaming analytics not old school ETL to handle this type of data as well as some other types of data but I'm putting a prediction here on IoT type data because it's a strong trend and streaming analytics is a definite must in terms of all that. You got real-time data and you have to take appropriate action on that data before a problem becomes critical. Some of this ties into my edge architecture trend back at the beginning of the talk. Moving right along. Sensors and automation will drive data volume so we all know data volume is growing. There doesn't seem to be any abatement of that that's going to happen. AI is going to dominate factories and begin to serve in many roles in society. Not necessarily 100% in 2022, it's not what I'm saying, but we're going to start to see more AI augmented teachers, cooks, pharmacists, law enforcement officers, athletes and other professionals. Universal translation, hundreds of sensors are going to be installed in our closed homes and overall environment and all these sensors and automation will be a top factor in driving overall data volume in this new year. Okay now I've tried to stay away from industry specific predictions but this was just a little too juicy. I had to put it out there and I think that it's something that we can all relate to even if we're not in healthcare and that is that medicine is going to jump the shark on neurological disorders leading to a DNA revolution. What might we do with that DNA? Will it be the next medium for information? Not in 2022 it won't, or will quantum computing be the next medium for information? Not in 2022, but down the road and we might start to see some early green shoots around that with DNA but more to the point. AI will treat neurological disorders like Alzheimer's, Parkinson's, spinal cord injury and blindness deafness robotic prosthetics may be becoming stronger and more advanced than our own biological ones. No I'm not going to chop my arm off for one of these things but I can know and we can all know that if it does happen in the future we might be relatively okay. No single psychological trait that does not show genetic that there is not one that does not show genetic influence. The ability to genetically modify cells in order to produce a therapeutic effect that is to add a corrected gene into the genome in order to treat disease that's what's going to really be happening in 2022. I'm not a big expert on this I'm not trying to say that I am but I do track CRISPR pretty closely and I'm just fascinated by the possibilities there. So look for that in a hospital near you and in the computing environments of the future. Now if you've been if you've been following my my presentations here month after month you know I talk about Sophia you know I like Sophia she's the young and accomplished humanoid robot from Hong Kong based Hanson robotics that already boasts a very impressive list of firsts and she just added a new one yeah she's the creator of the world's first AI generated piece of NFT art yeah I threw a lot in there didn't I she's a she's a she's a humanoid and she's doing NFT art yes and that's actually a kind of self-portrait you know it may not be my cup of tea this this art here but the video art and painting sold in an online auction for six hundred eighty eight thousand dollars as an NFT so there you go apparently there's there's something to it I think we're going to see more AI moving hard into design of things design of whiskey design of you know whiskey for example comes with a lot of variables right what casts are going to be used what the cast held before what the ingredients are how long it's going to sit there and so on there's actually 70 million different recipes that AI has created for whiskey and then it can curate that into what it thinks it's going to be the most popular and of the highest quality based upon you know the cast types that you may have on hand so it's getting into that it's getting into a lot of things it's getting into music design it's getting into art as you can see spices I I'm just thinking of different things that I know of that AI is moving hard into this is by the way as bad as the tech is ever going to be it's just going to be a whole lot better as time goes on this brings this brings up my opportunity to let you know that we're having an NFT here at advanced analytics we can get a picture of me giving this presentation I'm only kidding I'm only kidding they do seem to be everywhere there though don't they I'm going to stay away from it because that's not enterprise analytics but I think that's an interesting area as well I do think and this is kind of off topic but I do think that this will be the year of deep fakes we're going to start to see deep fakes not know what to do with them and be fooled be faked out if you will by many deep fakes and we'll see what kind of chaos that causes keep an eye out keep a skeptical eye out as you go forward in 2022 AI will also be conversational so we'll get to talk to talk to people that have maybe historical people just people for companionship and so on through AI I won't go too far there I'll actually come back on that point though in another minute or so and that design from AI is going to extend to tech and software what we do okay Google said that a chip that would take humans months to design can be dreamed up by its new AI in less than six hours the AI has already been used to develop the latest iteration of Google's tensor processing unit chips the tech giants engineers noted that the breakthrough could have major implications for the semiconductor sector so AI designing for things like that as well and I'll throw in another technology here and that's auto ML something I'm feeling like has a high markers here for the new year auto ML it's a machine learning model designed to create other machine learning models and helping you to automatically select the algorithm something that we're largely doing by choice and making a human decision about today so at some point in the future will AI be writing most of the computer code out there are we at the start of that I think we are we are at the start of that auto ML it's one of the technologies that's enabling that but there are others and it's it's able now you see the build code there able to explain code in English and suggest improvements and it can write code to the defense advanced project agency that's DARPA's probabilistic programming for advancing machine learning program is developing new technologies that improve machine learning for questions both deep coder and auto ML use machine learning to produce executable code based upon entire knowledge basis the basis that they use can not only generate the entire data hierarchy for a project but also the entire user interface in the middle layer I hope I haven't made any of you sad by saying that there's continual need for uh stem uh out there it's just going to it's going to morph it's going to grow it's going to morph auto ML moving right along here auto ML cements itself as the future of ML auto ML tools enable self-service data science using no code low code tools businesses can build train and deploy data models for deep analysis and insight generation some examples on that tableau has made this concept of business science business science a priority so business science that's like self-service data science enabled by augmented intelligence machine learning uh click acquired big squid for its auto ML capabilities altrex made auto ML center piece of its may 2021 platform update and we're just going to see more and more of this coming in 2022 it's not going to be as robust as a train data scientist in 2022 but it's so useful it's a great starting point and I think we're going to see more use of auto ML in 2022 now my last prediction here is a GPT-3 GPT-3 now maybe you've heard me talk about GPT-3 before and and call it a trend before and I'm calling it still a strong trend so so much so much to go on with GPT-3 it was trained on hundreds of billions of words and cookbooks so essentially it can do anything that you might think AI should do with with words with text so it can't do and can't do all of that maybe as well as it will well clearly it can't but it can do a lot of different things I talked about the training and calculates how likely one word is to appear in a text given the other words in in the text so this is known as the conditional probability of words so it's really good at that so for example if I'm going to the gym and I go to get my blank GPT-3 can intelligently fill in that word and it probably would not put the word piano in there right might put the word shoes in there or towel or something that's appropriate Microsoft has the license to the exclusive use of GPT-3 and I think that that is one of its strong assets the public can still use it to receive an output I'm actually in the beta program but only Microsoft can control the source code now as I yell from the mountaintops here about GPT-3 I do note that the Wu Dao 2.0 model was created by the Beijing Academy of Artificial Intelligence and developed with the help of over 100 scientists and what makes this pre-trained AI model so special is the fact that it uses 1.75 trillion parameters GPT-3 uses 175 billion so quite a difference there I know I haven't I haven't touched it I am tracking it and it's being used to simulate conversations understand pictures write poems and even create recipes a lot of the things that GPT-3 does as well so this is all in the area of natural language processing and LP you see that there it will continue to evolve understand speech rhythms along with all of our human idiosyncratic speech patterns ums us and words with mixed meanings it will continue to learn which ones apply making much more reflective of human speech and much more able to direct queries and resolve concerns we see NLP now implemented in our enterprises in terms of the help desk in terms of the call center in terms of contract management and a lot of different things like that anything really that has to do with text I think we're going to start to think about NLP and maybe specifically GPT-3 in 2022 okay so we've had a little journey through trends for 2022 you may agree or disagree that's fine I hope you give it some thought come up with your own list and go forward into those things and I think you'll find it really super exciting to do so there's more maturity and moving imperfectly than a merely perfectly defining shortcomings we can all point at things and say well that's wrong well there are reasons it is the way that it is let's be a part of the change right so build your credibility tout these or your preferred approaches for 2022 what you think is going to be necessary for your organizations to meet their goals don't talk yourself out of having a new beginning have an open mind no plateaus are comfortable for long and any resistance that you find it's probably not about being a leader it's probably not about getting to the end game it's probably about the journey so be sure you address the journey as you start to talk about these winning approaches in 2022 in your environments and just a quick recap here edge AI containerized data with Kubernetes synthetic data avoiding miscommunication like around the data fabric so that's a winning approach the data fabric itself you might consider a winning approach as well but my main thing there is let's be on the same page as to what we're talking about AI enabled applications data catalogs data observability streaming analytics AI design AI complimenting human design there auto ML and GPT-3 as an example of NLP that I think will be strong in 2022 and with that I will turn it back to Shannon and look forward to your questions William thank you so much for this great start to the 2022 2022 new year really appreciate it thanks again search if you have any questions for Courtney or William feel free to submit them in the Q&A portion of your screen and just answer the most commonly asked questions just a reminder I will send a follow-up email by end of day Monday for this webinar with links to the slides and links to the recording of the presentation along with anything else requested um so a question came in earlier is there a tool that helps to query or create multi-dimensional queries from a data lake I'm happy to take a pass at that Shannon yeah absolutely I guess the dimension Patricia maybe I am happy to follow up separately the dimension depending on the dimension of the query itself there could be other solutions but I would certainly be remiss if I didn't say that that is a big part of what chaos that you're trying to do and what we consider to be multi-model analytics right so we can help you run logs queries so text type style queries or SQL queries on this same data set in your lake in one singular platform so kind of like you could be using looker to run a SQL type query and represent that data as such and use Kibana to run a log query same data set no isolation of those sources singular result and if that helps great and if it doesn't feel free to follow up with me via email and I don't know William if there are other multi-dimensional query providers you wanted to talk about well I agree with everything you said there's there's obviously a host of tools that you can use to access data you know in a data lake but nobody goes as far as chaos search in terms of the capabilities that I'm aware of and I'll just add that I think that when you're sitting on the on the cusp of of architecting an environment bringing on a new tool bringing on new capabilities think about think about these trends that I just talked about think about how far things have come in the past couple years or even one year and you can you can guess that is probably going in a similar direction fueled by AI so next year we're going to be talking probably about some trends that are all about AI and all those capabilities that you're going to need is an organization so think beyond your immediate capabilities think beyond what that person was in your virtual office yesterday jumping up and down on the table about because you're going to need more once they get that they're going to need more right we've seen this pattern over and over again so be sure you get into enough capabilities that scale when you do things like need dimensional data in the data lake and so on and so forth I love it and everyone's so quiet today that was the only question that's come in I'll give you all a moment to type any additional questions that you have into the Q&A there you know William before the webinar we're talking about you know how we at data diversity are seeing you know a lot of jobs booming in for surrounding metadata and data modeling because of companies standing up machine learning and starting just starting those programs are you seeing anything similar in the companies you're working with or in terms of of a continued focus on metadata and data modeling yes in terms of the I say the oldies but goodies we're not all these all these trends by the way should not take away from the need for solid data quality work solid data modeling or solid data management work solid data governance all those things it's always it's always tricky to to give a presentation like this and and you know make it I want to make it not seem like I'm glossing over all the all the good things we've learned over the years which still remain true so I'm glad I had this opportunity to say data modeling is still important metadata is still important I mean we talked about we talked about the growth of data right we talked about how data is going to move with the data fabric you know and we talked about the data pipelines now it's moving data but it's moving to it's moving that data to places that need to have great modeling and metadata above it and we're I'm just sort of glossing over that today but we can get down into those weeds and we can see that that's as necessary as anything for the success of the overall project Courtney anything you want to end that no I always find these sorts of sessions to be eye-opening when you have a moment to just think about where with the variety of of sort of solves for the plethora of analytic challenges there are the sorts of really compelling things that are happening so William thanks for taking us through that and Shannon thanks for hosting us today we are very happy to be part of this data diversity community and we look forward to talking to you all again soon I love it while we were talking we had a few more questions come in which is great and I love this heavy hitter one so thoughts about the data ethics implications of self-service data science you know that I'll take a first pass at that I think there's been some pushback on well we just don't know what the what the algorithms are doing so I think we have to you know move a little more carefully I think different organizations are treating that treating that differently and some organizations if regulation ever stepped in would be in more or less a world of hurt when it comes to this stuff because they've really banked on on algorithms that they know nothing about and they really don't have the AI expertise in how to to deal with any kind of defense but you talk about ethics I'm talking about you know there I'm talking about regulations and and that's more more more business oriented but I think the biggest thing with ethics around AI that we're dealing with is bias data and this is where some of that some of that you know generated data can come in really handy and so I was talking about that is a strong trend and I think that that is a trend to also help out to be sure that we do not fall into any ethical holes when we do our artificial intelligence it is important to have representative data it's important to have a lot of data and we can define a lot you know we would need some time on that but you know what I mean a lot of data that represents your entire consist constituency and doesn't just gear itself to one part of it or the other based upon what it knows and we cannot let the majority majority data if you will which may represent the majority of our consumers out there you know their characteristics just be the only characteristics that we care about so anyway there's going to have to be some knowledge there that comes out of AI I love it and how AI machine learning will influence and enable data quality yeah I think as with a lot of things we're starting to starting to see that some things can be automated and we're not to the point where I'm comfortable saying that well data quality can completely be automated now not at all you still need data governance coming up with custom rules for the shop you still need to apply them in a in a what's going to give the the biggest bang for the buck kind of approach and grow that over time and score your data and hopefully get it better over time that's still the approach that's still the approach but but we see that machine learning can do some of the things some of the rather ordinary things I'll say that have to do with with data like I'm doing one right now correcting addresses to a third party that's not exactly AI but some of the matching is AI and I think that uh whenever data quality in the past has been relegated to a data governor that will actually have to manually get involved I think that is starting to abate because we're starting to understand what it is that they're doing to the data I don't don't tell me you're using your judgment tell me tell me how that judgment plays out and that's what I want to capture and get get automated but you know what if if I'm not told that but I can see the pattern of that activity over the course of time artificial intelligence can see that pattern and take it forward and that's that's just as good so I think that's one area that machine learning will definitely be helping data quality in 2022 awesome Courtney anything you want to add to that no not on that one I'm concurring with William love it all right some additional comments and they came in I love it well that does bring us to the end oh we got one more question coming in here I'm gonna throw it in we got just a couple of minutes from an ESG goals perspective environment social and governance any thoughts on trends that would facilitate an organization's ability to indicate and document such to meet scrutiny of regulators stakeholders and consumers that's that's similar to to the ethics question or at least how I addressed the ethics question we just got to see more explainable AI and I thought it was a trend last year I think it's I think it's still the trend but maybe not as strong of a trend as before I'm kind of gunchai about making that a trend didn't see it happen we'll see more of it for sure we'll definitely see a lot more of it if regulations take upward in that direction if you're in Europe you have more of this now already than we do here in North America so there you go it's going to be somewhat regional as well I love it well thank you both so much for this great presentation and thanks to our attendees for being engaged in this I really love it and love you guys as always and thanks to kia search for sponsoring today's webinar and helping kick off the series in the 2022 new year appreciate your help as always and enjoyment and engagement um so thanks everybody again I'll as a reminder I will send a follow-up email by end of day Monday for this webinar with links to the slides and links to the recording of this session hope you all have a great day thanks everyone