 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of DataVercity. We would like to thank you for joining the latest installment of the Monthly DataVercity Webinar Series, Advanced Analytics with William McKnight. Today, William will be discussing assessing new databases, translitical use cases. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section or if you'd like to tweet, we encourage you to share highlights of questions via Twitter using hashtag ADB analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the Zoom chat defaults to send to just the panelists, but you may absolutely change this to network with everyone. To find and open both the Q&A and the chat panels, you can find those icons in the bottom of your screen for those features. We'll send a follow-up email within two business days containing links to the slides, the recording in this session, and any additional information requested throughout the webinar. Now let me introduce to you our speaker for this series, William McKnight. William has advised many of the world's best known organizations. His strategies form the Information Management Plan for leading companies in numerous industries. He has a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. William is a leading global influencer with Data Warehouse and Data Warehousing and Master Data Management, and he leads McKnight Consulting Group, which has twice played on the Incorporated 5,000 list. And with that, I'll give the floor to William to get his webinar started. Hello and welcome. Thank you, Shannon, and welcome, everybody. Welcome to this edition of Advanced Analytics. I am happy to be bringing you this edition from live from New York City, away from my home base today. But that doesn't matter. I have my trusty microphone and away we go. So today we're going to be talking about assessing new databases, translitical use cases. I'm going to focus on use cases today. So I'm going to talk about all these new real-time use cases that we're trying to maybe force fit into older architectures. And there may or may not be a better way for that. There may or may not be a way of the future coming that incorporates both transactions and analytical. We will drill down on that topic today. You may have heard the terms HTAP for Hybrid Transactional Analytical Processing, or HOAP, Hybrid Online Analytical Processing, or maybe you've heard upper-litical. I had to pick one that would probably bring in the most people. So I chose translitical, but it's all the same, all these four terms, trying to address this new emergence of databases that can handle both. You may also, I'm sure you've heard of event-driven. There's a lot of overlap between an event-driven architecture and a translitical database. So we'll get into that a little bit here as we go along. So I thought today that I would highlight our offerings to vendors because we get a lot of vendors in the audience for this, which is great. And I will just shorten this up by saying that we offer some real competitive education to vendors in all data spaces, whether that's a complete teardown of the competition or educating you with workshops about your competition. All right. So here are the two sides of data processing today, OLTP versus OLAP. And so I just wanted to be sure we're all on the same page in regards to this because most of us have lived in one or the other world. And maybe or maybe not, we regard the other world as being inferior or less interesting or what have you. But I want to be sure that we're both at least up to speed somewhat with the differences here. OLTP, online transactional processing. This process is business interactions as they occur. So it might be the airline reservation, the checkout at the grocery store. It might be processing a claim in health care, something like that. OLTP supports limited query. Why is it limited? Because the OLTP is busy and typically doing all it can do to support those transactions. And there's a focus here on insert, update, delete, getting the transactions in, updating transactions as changes are made through other transactions, right? Insert, update, delete and individual transactions. Low latency and high throughput is what is needed in OLTP. It also needs to have asset compliance that stands for atomicity, consistency, isolation and durability. So I've talked extensively about this before, but it has to do with making sure there's integrity in the transactions. And then usually the OLTP has what looks like a normalized data model of work. There's a lot of tables and every table has only columns in it that support the primary key. In addition, of course, to the primary key column or columns. So this tends to involve a breakout into a lot of tinier tables than OLAP with its dimensional modeling. OLTP is where we have to start the business. Without transaction processing, there is no business. Now OLAP, online analytical processing, that is what's going to bring a lot of more value to the business. And that's where you can differentiate. You're not just processing transactions, but you're bringing analytics to bear on those transactions. And this is complex, can be more complex analysis. A lot of times, because of the limitations of OLTP databases, we offload processing from OLTP to OLAP. So all the things we'd like to do in the data, we're doing downstream, of course, not real-time, but we're doing in our old-out databases that we push the data to. Typically this data is modeled dimensionally, where you have a fact table, so-called fact table in the virtual middle of the model surrounded by dimension tables that are often multiple levels packed within a single table. It's a beautiful thing. And many of us on this call, I'm sure, have spent our careers doing that and doing OLAP. There is light data modification from source, but the point is data modification from source is certainly available. I think there could be more for data quality, but that's sometimes a tough proposition to be changing data, period. At the least, maybe we don't change data, but maybe we bring some additional analytical columns to bear on our OLAP than what appears in OLTP. These are complex queries that run against OLAP, and frequently they are long-running, and nobody likes what they would seem to be long-running queries, but let's face it, a lot of them are longer than we would like, and they tend to be longer than OLTP, which is more or less single-record retrievals. So OLAP can span terabytes to petabytes of data, those queries. And there tends to be a large data accumulation. Where are you accumulating history? Are you accumulating it in OLTP or an OLAP in a data warehouse or data lake? Data warehouse and data lake, by the way, are OLAP constructs, unless we're talking about some sort of very different operational hub type of thing. Data warehouse is probably the quintessential OLAP database, and it's a term that in the past few years especially has become very frequently used, maybe to describe things that some of us may not say is a real data warehouse, but whatever that term does supply over in the OLAP arena. So the data warehouse, yeah, what about that? Why doesn't it just work for this new breed of transactions? Well, it's not dead. I'm not saying the data warehouse is dead, but I am saying it's dying. I am saying this notion of having to copy all the data around to a separate store is now really starting to die out as a need. I think we're going to need a data lake to capture all that historical data. Data lake being kind of cheaper storage, kind of storage that you are doing really long-running data science type queries on and so forth. Interestingly though, watch your terms out there. Watch your terms because one database vendor uses S3 as their primary store and applies its own block storage to that, and they call what they can deliver a data warehouse. Another vendor does the exact same thing. Applying their block storage to S3, they call theirs a data lake. It's the same thing. So anyway, be careful out there with terminology. I think I say that every presentation. Capability requirements. So what do we need? We need analytics on live data, recent data, and historical data, not just on historical data, not just not taking current context into play. Real-time analytics is what we need. Calculated from across data domains, many data domains. We need some pre-calculated data though. Pre-calculated data is a necessary evil in data architecture today because the data that we want to bring to bear on a real-time operation such as real-time analytics, we don't have the time to calculate them all on the fly. So pre-calculation is necessary. Of course, pre-calculation is not the greatest thing in the world either, right? Because it's not going to take into account right up to the minute, right up to the second, I should say data. So where you go pre-calculated versus where you let things be live and maybe impact performance, wow, that's an art form. That's why you have a data architect in place to make big decisions like this. And if you're putting in place some new systems, you might make a hundred of these decisions along the way. So a lot of fun, but these decisions are critical to performance and critical to functionality. Live analytics is what we need. We need our live analytics to be usable operationally, not just calculate them live and storm away, but calculate them live and use them right away. And so we're in this world, we're moving to this world where we cannot predict everything that's going to happen. We can utilize more data than we can pre-calculate. So we are considering, as we move along here, we're considering every situation out there in our business is unique. And I'll drill in on this more because this is a very important concept for us to understand as we think about our architectures. As we can use more data, bring more data to bear on our transactions, we're going to need more data stored and usable. A seamless platform is what we need. We need operational SLAs. We need to meet the operational SLAs. That's number one, isn't it? If we're not meeting operational SLAs, all the rest is unimportant. So I'm talking a lot about analytics here. It's a term that we in the industry throw around a lot. And I don't think we're on the same page with this. So let me define what I mean by it. Anyway, analytics is the process of utilizing data to enhance business processes. Okay. Analytics is deep, simple knowledge. They have depth. And we could talk all day about the depth, what is depth here? I won't do that. But it's deeper than, okay, let me look up this customer's address. It's more like, let me research this customer's complete history and everything going on around him or her at this moment and determine what the next best course of action may be. That's deep. That's deep. That's not a shallow lookup. So there's what we call analytic projects, projects that are really focused on this, but get their own budget. And then there's analytics that we add to projects. And sometimes we add them too late or we under scope them or we underscore scope their importance. So, but they're, I mean, they're both the same in regards to meeting these analytics to be brought to the bare in real time on operations. So where do analytics come from? They come from batch and they come from real time. And this is that art form that I was talking about earlier. Batch oriented analytics are ones that have broad context, the ones that drive reactions. There's action options, like for example, a driverless car. When a driverless car sees a stop sign, it should stop at the stop sign. There's really no if ands or buts about it. And this applies to most people. Yeah, this should probably apply to all people. And these rules are relatively static and don't change. And these are just, they are rules that you can say, every time do this. Now, real time rules get more complicated. They're more dynamic. And this takes into account the immediate context. Never tell or predict what the immediate context is around a transaction. Not when you consider maybe the 100 variables that we could possibly get our hands on at the moment of that immediate, of that transaction. This is activity that we cannot do in batch. This is unique activity. Now you may say, well, there's not a lot of uniqueness to what we do. You know, somebody's in the store. They're buying whatever. And we just sell it to them. And that's it. That may be true today for you. But the way things are going, there's a lot of variables around that transaction that could be brought to bear to make the right next best action for that customer, for the company. So like if a car, one of these automatic cars, is running low on gas. And the gas state, I say low. I mean, I should use a more descriptive term. A native tank of gas. The gas station is 20 minutes away. Home is 30 minutes away. But tomorrow is not as busy. Maybe I can get gas in the morning. Maybe I'm running a little tired right now. So there's a lot of things that can be brought to bear on the decision of whether to pull in, get gas or not. Analytics, the batch stuff is anytime, again, anytime A happens, you do B. But real time is when you bring into bear a lot of different variables. So the benefits of real time analytics. Speed is number one. Speed means that you're not considering things after the fact. You're considering them in real time. You're not hampering the transaction unduly so that it is impaired. Customer service is impaired. The network is impaired, things like this. You don't want that. But it helps you seize opportunities. And then there's the customer experience benefit where businesses can anticipate problems and streamline operations. I wish my website provider would have anticipated the problems with my website this week so it didn't go down for a day. But anyway, that didn't happen, so maybe they could benefit first from some real time analytics. There's also operational excellence. Real time analytics allows organizations to gain a clear view of the business and understand what needs to be done to address potential operational issues. And it's all about that deeper understanding. So when there's a need for deeper analytics to make a business decision, real time analytics can help compare real time and historical data to inform the decision. Now, I'm going to get to some solutions to these real time analytics challenges, but do know that most of the real time architectures are focused on, and rightfully so because that's kind of the problem area right now, real time ingest. Real time ingest. Without the ingest of the data, we don't have a lot. There are some translitical use cases. So we're here to talk about these use cases. I'm here to raise your awareness that if you're working on one of these use cases, you need a real time architecture. And that real time architecture needs to bring to bear analytics on the operation. And I will read them all. You see them here. Maybe you can find your way in there. Of course, there's plenty more. The key words that I listen for to know if I'm in a translitical situation, which is becoming increasingly more and more. Real time. Real time analytics. Operational excellence. How can you have excellence in your operations without analytics? Just transacting. I don't think that's real excellence. Operational analytics. Real time data warehouse. A little bit of an older term, but nonetheless, when I hear that, you might be in this category. Real time analytics essentially means it's provided for analysis almost immediately. Once it is collected. It's a way of the future. A way of the future is not to have a second store where we have to copy everything. The way of the future is more or less virtual integration that happens fast enough and brings to bear all real time data on the situation in real time. So in other words, all possible options are considered for a transaction and the right course of action is chosen based upon all the variables in play. And by the way, 90% of your transactions, let's say you have, I don't know, let's say you're considering 20 analytics for a given transaction. 90% of those transactions may only use one or two, but the higher value ones are going to get into more and more. And over the course of time, no matter what your transactions are, you're going to learn to use all 20. You're going to learn to blow past that 20 into many more. So let's talk about some examples. These are some examples from practice. And the first one is sort of general and applies to many of them. And that's the next best offer touch. I've kind of already alluded to it. This is the need to incorporate not only analytics last night, but also today, all morning, the last hour and the last second into whatever you're rendering on the screen about that customer, about that transaction. And maybe you're not even screen rendering it. Maybe more or less for the future it's going straight into the right operation, right next best business process. Need to incorporate not just the users data, but all users data. Because yes, I may not have purchased on one of these so-called promotions that you have. But there may be an explosion of purchases by people like me of that promotion in the past 24 hours. That would be information that you want to bring to bear. Plus you're correlating my next best offer with people like me and maybe somebody just sort of moved into that category of being quote unquote like me. This is very dynamic. This is not stuff that you can do in batching to be effective with anymore. And eventually it's only AI that's going to be able to operate at the needed scale. So this is next best offer, next best touch point is a theme that you'll see throughout these examples. So this is a real example from the financial market. They had billions of API requests daily. They needed 5 and 10 millisecond average response time. The data they wanted to include was real-time and historical stock price. Cryptocurrency, forex, commodities, currencies, premium data, pretty much you name it. And front office traders needed the real-time analysis. There's no let's take this in batch. Let's wait for the warehouse to operate overnight. Even a real-time data warehouse is still waiting on something to be pushed and calculated and fed back. Real-time analytics, a lot of it is data is left in place but the architecture is able to reach out to all the places where that data is and bring the right information to bear on the transaction. Premium data in this example comes from a growing community of curated partners such as Wall Street Horizon, which has corporate events, fraud factors, audit analytics, value engine which does forecasting. Stock Twits, which is an investor's social media platform and much more so there's a lot of data to be brought to bear in the day-to-day of that company. Healthcare as it moves to genomic medicine the amount of data that needs to be considered to make the right health decision is going to jump enormously. We're into virtual business now telehealth and AI doing the triage of us as we have health conditions. AI, diagnostics, robotics, automating lab work so what needs to be considered in real-time for this healthcare company for the healthcare industry really? Well, recalls do we really want our provider to be waiting 24 hours to know about a recall and act on it? How about outbreaks? COVID-like outbreaks yeah, I want to know about that. The latest findings in research in field work and so on and just simply the pandemic footprint. What is that looking like now and how does that factor into whatever healthcare I might be providing to the individual? Human beings have roughly 20,500 genes and in DNA housed in each and every one of the trillions of cells that make you who you are what causes what action? It's complicated. There's a lot going on there and so batch analytics are still going to be very needed in healthcare as we get sort of general learnings but you can see the depth to which even those batch analytics are going as well as not to mention the real-time part of the equation putting those two together healthcare has a long way to go and a lot of opportunity. A retailer better and personalized product recommendations similar to the right next best offer continuous and automatic retraining of the ML engine I highlight that because that is something that we need to understand as we get into machine learning for all the things that we do and that is that those engines need to be continuously and automatically retrained. We need to get to that full 360-degree view over business operations and to improve customer satisfaction and that 360-degree view is not something that's calculated in batch. It includes the now it's very much up to has to be very much up to the minute. Now this is a metaverse company and now they're incorporating shares, vests, set generators and better directional sound systems, avatars as full virtual agents, and some people have gotten surgical implants too, things in the metaverse. Anyway, the metaverse is all about simulation and these avatars are able to act with entirely defined parameters as our agents, our companions and some may even be considered co-workers. It's all real-time. Real-time actions just kind of like a video game. All real-time actions based upon context, based upon what is going on at this moment and considering what we have learned about the entire concept. So in a metaverse this could be a metaverse city that needs to be managed, could be one of the metaverse attractions that needs to be managed, etc. So you're going to have a full parallel life in the metaverse and this is where NFTs and crypto are going to take off. So bitcoin may displace the US dollar at some point as the primary form of global finance, etc. The metaverse is going big, there he is and yeah, it's all real-time. Transportation is becoming real-time, it's becoming driverless and autonomous. Floating or vertical warehouses delivering packages. That's how packages are going to start being delivered. Urban transportation from city centers. Airbus drone-like pop-up concepts are going to be happening more and more. I understand here in New York City, I can grab a helicopter over to the airport. I might do it sometime just for the fun of it. I don't think I want to work my schedule such that is necessary though. It's pretty packed. But what kind of real-time information do you want to bring on systems like that? Transportation systems. Well, traffic comes to mind as I look out over some real traffic here and weather comes to mind. The current weather and the patterns is constantly changing. This is real-time information that transportation companies need to be on top of and need to be delivering analytics around. Now, this one is sort of a general one that has to do with any of the number of our clients or companies out there that you know of that are using cameras and audio recording. Cameras will be abundant. They are getting very much abundant. Of course, that's all real-time information that it's taking in. So what does it do with the real-time information? Is it back to what I was saying earlier? Is it always going to do the same thing based upon what it's seeing? Or are there times when it's going to want to consider who it's looking at, what the context is, what the weather is, who's around him or her, what's around uniquely at this moment in time. A person's profile will become more and more evident. So if we walk into a furniture show virtually, before you say anything, the store will know your name, employment status, your buying history, and credit rating. And maybe where you've been today, the clothes you're wearing. And how about your criminal history, your consumer history, and your marital past. It's only a matter of time before data brokers begin drawing from online dating profiles and social media posts as well. And how about our DNA information, such as what, 23andMe, etc., have collected. So we will eventually allow all of this for the convenience that it offers. And so that's a lot of information to bring to bear on a lot of possibilities for what to do with what that camera is looking at. Eventually someone might be able to point a phone at you or look through their special contact lenses and see a bubble over your head marking you as unemployed or recently divorced. We'll no longer be able to separate our work cells from our weekend cells instead our histories will come bundled as a pop-up on stranger screens. We're already doing some of this, some devices that we talk to will record and upload our conversations. Like Amazon Echo and there's the talking hello Barbie doll that sends those things wirelessly to a third party server where they are analyzed by speech recognition software and shared with vendors, read the filing print. Even our thoughts could become hackable. I'll stop there. My point is it's a lot of information to be bringing to bear on the moment. And if you can't act in the moment with this information, it's almost useless. Especially when you're talking about cameras and audio recording because they're moving, right? They're not standing still waiting for the batch analysis to happen so that the next right best action is going to happen to them. What about in manufacturing? On the manufacturing floor there are real-time dashboards showing everything that's going on. There's a variety of data sources when they ingest data. They must recap like the entire data set. This is where they were before they moved to a translatable architecture. Cross-matching survey results at the team and individual level and they were very concerned with their NPS score as are many companies. Processes that formally required 10 steps were streamlined down to just one when they fixed their architecture and moved to a true translatable architecture. So this company previously had to run their advanced analytics offline. And if you looked at the dashboard and wanted to drill through, the waiting times were too painful. So that just wasn't going to happen. So it has to be done automatically. It has to be done in real-time. This company in the past, in order to provide those analytic insights, they were moving data into SPSS. Remember that? That was slow. But with a translatable approach, they can now slice and dice data in real-time in regards to NPS instantly understand the validity of a data correlation. So every little thing goes into the NPS score and they need to know how and what things and take action in real-time. Now, asset management into an asset visibility was necessary here. They needed one place to discover all the assets in the environment. And this was constantly changing. And they needed instant context around risk, vulnerability, threat assessment and threat detection. They can't wait for the data warehouse to come up with these things. And when the data warehouse comes up with these things, it's all general. It's all applies to everybody in every situation that we can think of or put our data arms around. Instead of it being open-ended, here's all the data in the universe that we have access to, use it and do the right thing. And that's what we're moving to. This company had a hundred billion events per day. Devices, firewalls, IOT, etc. They originally were using a Postgres SQL database and over time this the time-based dataset got too large for Postgres to handle. At this point the team migrated this dataset from 400 plus Postgres SQL databases into a huge elastic search cluster moving forward. Security surveillance, this is going to be similar to the camera example I had before. This company had a goal to view all sites in a single cloud-based package and offer analytics from video data. Here again we see unstructured data which makes the process of analytics all the more complicated. You may think of some analytics that you want to have in a situation like this, but do you really want to calculate it in batch for everybody? Or do you want to calculate it on the fly for only those situations that could use the analytics? Well, that being the case you want to move to more of a real-time analytics approach where the data is, where the data is and you're able to bring it all together in one place. The biggest change here was scalability with their OLTP database. Trying to scale that for analytics too doesn't usually work. Finance, now embedded finance, that's what this company does, is when the non-financial companies offer their customers access to credit through their technology platform and this is happening more and more. So they started with an easy to prototype, ingest data, do basic reports which require replica sets which had to be done automatically. There were performance constraints on rights to the Postgres SQL database, they had to do a bulk load of the data. The replicated data needed to be readjusted and dashboarded only refreshed once every 24 hours, leading to a serious and unacceptable lag in data refreshments. Now for those of you that are out there saying, well, you know, that's good enough. Nobody's rattling my cage to do anything more frequent than 24 hours. So I'm going to work in that mode. Well, think of the possibilities of what could be done in less than 24 hours. Think of the possibilities of what could be done with up to the minute data. Now, you may think of them and not be able to think of anything and that's okay. That's going to apply to some situations but as time goes on, that 24 hour period is just not going to be acceptable to bring to bear insights. We're going to have to do it on a more frequent basis. Esports, this will be my last example. My last use case for you. They needed to offer real-time and historical live streaming data to analyze trends and performance across all genres, games, events and channels, kind of like a video game, right? They needed to work with thousands of time series data points in complex multi-gigabyte aggregated queries, analytic speed is a top priority and they need to understand spikes in viewership. Now, you can transpose your company into one of these slides, I am sure, and come up with similar needs. Well, all these needs of all these companies, including the one you're looking at here, has to do with they have a translitical need. So let's address the translitical need. Data architecture needs for translitical workloads. You need fax, streaming, ingest. Millions of events per second. LinkedIn has 80 million events per second. Now, we're not all LinkedIn, I know that. I know that. But we're somewhere in there and we're all kind of moving in that direction, all mid-sized and up companies are wanting to get a hold of or at least utilize all of their events. Low latency is what you need. You need high concurrency, perhaps thousands of concurrent users. You need unlimited storage, pipelines, not ETL. And you need transactional consistency. So that asset, the asset properties. You need parallel, high scale streaming data ingest and immediately immediate availability of that data. Well, there's different needs different ways to skin this cat. And that's what we'll talk about here. Now, this is a data architecture that is not fit for translitical. You might look at and go wow, I'd like to have that architecture. We're a little messier than that. It's okay. I'd like you to have that too. If this is if you're not up to this level, right? Where you have nice low latency sources that are stream processing or spark processing through to a data lake, which is also your staging to your data warehouse. You also have batch input from transactional databases into your data lake, which is feeding your data warehouse. The data warehouse and the data lake work together in a lake house environment, which starts its queries at the data warehouse level, which reaches through to the data lake that the data warehouse also will do some calculations and analytics, if you will. And that those will be passed on to the data lake where all history data is kept. Now in this architecture people tend to be on one side or the other, either on that pre-analytical or I should say, you know, operational side of the line or on the right side, which is more your analytical side of the line. And yeah, you see there's a big old wall, big old Chinese wall between these two worlds in this architecture. Now one database solutions trying to do analytics with operational databases or trying together multiple, to put together multiple databases to power up the applications with analytics is what we've been doing, what we've been trying. So question here to think about is it more that analytics need operations or the operations need analytics? Which way which way is the data need to flow in this? It's really analytics that are trying to do operations today and failing and also MySQL, Postgres SQL, those sorts of things they are also trying to do analytics and operations in many environments and we're failing at that too. Now you might say, well I'm not failing, you know, I'm succeeding. Well what is the depth of the analytics that you're really doing with that sort of an approach? I would say it's probably less depth than what it will be or what it could be even today. So that's something to think about not just again, not just am I meeting user demands you know or people pounding my door for more, that is not a good barometer for whether you're doing enough and think about what the possibilities are and make sure you're striving to that. So yeah, the wall between. Now some of us have been around long enough to remember when this was invoke the operational data store. Okay, so this is when a source system usually one fed a database that's kind of between that source and the data warehouse. It's not just a staging area but it was a staging area upon which queries were run because those queries couldn't wait for integration at the data warehouse level where they really didn't need the integration. They just needed to fix the problem of the transactional database not being able to do or just to take on the query. So we created a database in the middle basically in ODS right usually these were again single source. This was our attempt at real time for access in the past. Okay, what about the data lake house? We're hearing a lot about the data lake house. A lot of vendors are adopting this terminology sometimes begrudgingly because only one company came up with it but nonetheless the concept is applied with many vendor products now. Data lake houses are great and I'm not saying no to a data lake house approach of course but I am saying that that's on the face of it is not enough to do translitical they do not support transactions and they do not enforce data quality very well and their lack of consistency and isolation makes it almost impossible to mix a pens and re-reads and batch and streaming jobs. Okay most major data platforms have converged their messaging around this so you're going to hear a lot about it and I'm just putting it out there to say hey it's great and this can be a component of your translitical approach but it doesn't solve the operational problem that you're still going to have. What about NoSQL? Well some NoSQL vendors are veering into translitical territory however on the face of it NoSQL for operational big data is a great thing but it's really for operational big data. Operations not analytics so what NoSQL gives you is more data model flexibility you don't have to steam it first the data you can load it and write off and faster time to insight from data acquisition they relaxed the assets so you can get that data in there's low upfront software and development costs fault tolerant redundancy and linear scaling to web scale so you're not worried about scaling with these NoSQL databases and indeed a lot of transactional databases are being replaced by NoSQL databases that is great but that is only part of the answer to the translitical problem because the analytical side of things isn't here so much and these are not set up for great complex analytical queries as much as they are for getting the data in so there's a lot of data that's fit for NoSQL but not completely for translitical event-driven architectures so I mentioned LinkedIn had it's what 80 million transaction or events per second do they need to store all that or do they just need to process it I think they just need to process it a lot of my clients have way more events than really need to be stored now they might store it store some data on a selective basis there might be a selective feed to a data lake but this is sort of translitical because you can perform the analytics that you need in some of these architectures and you're certainly getting the detailed data flowing through and you're processing on that the question becomes what is the depth of processing that you can do on events and the more depth the more you're into analytics the more you would be into a translitical architecture needs and I'll show you some examples from azure and AWS here in a bit their answer to event-driven architectures and you might look at that and say well that's what we need a Kafka connector real-time PubSub messaging platform and edge computing to kind of spread the computing around and also get things closer to real-time out on the edge so I guess the point of this really is as you're thinking about this make sure you're storing only that which you need to be storing I know that sounds kind of different coming from me and coming from maybe a data-driven person but I've seen the light with some of these companies and all their events and done some calculations around the cost of storage even the ability to store it today many things will change over time all that data will be interesting all the time we're not there yet that's probably a decade out but for now my clients are storing selective data to a data lake that maybe the data scientists can do some deeper research on like summaries of data like spot levels of data like maybe all the data for all the events for a given store so that I have a store that I can look at as I develop improvements to my business to my operations so that's important what about single product architectures how about a single table with storage for transactions and analytics well what does that look like I thought there was one way to store data no there could be more this is going to give you fast IUD and query a simplified data architecture and reduce data movement and true translitical in one product because it has a row store and a column store now we're mostly familiar with row store out there and if you haven't done anything but create table it's probably a row store now some of these analytical databases they default to column stores so you may not even know it but some of them are column stores so what's the big deal here in terms of how the data is stored why does it have to be stored two ways well let's talk about the column store maybe if you're not aware some light bulbs will go off here single store I'm using them as an example here they use two storage types internally an in-memory row store a dis-based column store here's a look at the column store and I show the possibilities here this is a customer table and you can group up many or one column in consecutive storage and that's what a column store basically is is that all of a column's value is stored together so first name all the first names are stored together last name all the last names are stored together maybe that's not a good example because you might want to store both of them together but however you however you do it and by the way you see all the possibilities here there's even more than this for these small set of columns so it's important that you design this well possibility one is basically a row store because all of the columns all five columns of this table are stored together so that's what a row store does no benefit there but possibility two is a full-on column store where each column is stored individually consecutively on this so all the first names all the customer name, customer city, customer say all of them are stored separately and they can be glued together or pieced together in a query if you need multiple columns which frequently you do based upon their position their position within the in this case we're going to call them containers all right now in this architecture readers don't need to wait on writers each version of the row is stored as a fixed size structure variable-length fields are stored as pointers this is all according to the table schema along with bookkeeping information such as the timestamp and the commit status of the version so who does this Oracle kind of does this they have a dual store approach obviously single store which I'm using as an example here now interestingly Snowflake has announced combining transactional and analytical with unistore and I think that's significant that a company like Snowflake would make that announcement why are they doing that well they they see that they see where the future is and so what I encounter a lot is customers trying to do analytics with operational databases that doesn't work or trying tying together multiple databases to power applications with analytics kind of in a kind of a kluge way we also replace a lot of the first generation operational databases such as MySQL, Postgres, RDS and also augment data warehouses or even how to do to power real-time analytics now I promised Azure and AWS here okay the Azure real-time environment is what you see here notice the bi-directional area arrows coming from that top looking box between the Azure ML managed online endpoint and Azure machine learning the data lake and into the Cosmos DB which is an operational DB now this is two databases they are separate but they're tied together because the machine learning is feeding both appropriately so what you got in here you got Microsoft Azure and Microsoft Intelligent Data Platform you got Azure Kubernetes service which I have recently done a benchmark on by the way let me know Azure Cosmos DB Synapse Link for Cosmos DB Synapse Analytics Synapse Pipelines ADLS Gen 2 Azure ML Power BI and Microsoft Purview for Data Governance all these pieces are in this architecture making it all event driven making it all real-time and solving a lot of the problems of translitical it's a bit to put together though I must say similar things Amazon Elastic Kubernetes service called EKS DynamoDB Glue Redshift on S3 SageMaker and for Data Governance usually there's a third party marketplace solution or a partner solution like Elation or Kaliber notice again the bi-directional arrows from the machine learning box to the data lake and the data warehouse and this is how those two are powered with real-time information which is necessary to bring to bear on the operations which is supported by DynamoDB in this architecture of course this is all in AWS you can replace components ad nauseam as you will and most of you do somewhere or the other now you can have a single product single product for for the solution to translitical there is a difference between one vendor, one product and one vendor with two products obviously these are single vendor solutions single store also single product Oracle which I alluded to before Snowflake with its unit store we have yet to see I think the ink is not dry on whether that's a single product or a two product solution Sondra, Azure which I showed you, AWS which I showed you Google which has a similar picture to Azure and AWS frankly a little bit more on Oracle they have that dual store approach rather than a single store like single store itself you need the Oracle database license diagnosis and tuning pack the Oracle rack option exadata for the columnar compression and performance we talked about how important columnar was in all of this partitioning pack and active data guard options Snowflake again has announced combining transactional and analytical with the unit store which again I find significant now you can tweak traditional architectures maybe by putting a cache in place on the data lake something like what you see here with Redis and maybe changing the operational databases to no sequel you're getting closer data lakes can be however if you're counting on them for your real time analytics they can be difficult to manage and govern do their size and complexity they can be difficult to extract data from regularly do the variety and volume of data that they contain so this is an approach that some have done and when it comes to multi-vendor architectures the list is endless the possibilities but these are some that people have tried to deploy or are deploying for their real time needs today Spark, Cassandra, Elastic, Druid MongoDB, MySQL, Redis DynamoDB these are just combinations that we see out there in the marketplace but still organizations are sometimes reluctant to attempt to analyze real time data the analytical workload will hamper the performance of the operational work that has to be the priority as I mentioned that in a nutshell it is a fear that we have to get over and we have to architect around hopefully you've seen some possibilities here today we did a benchmark you know we do a lot of benchmarks we did the benchmark and this is of a single database competing in both the analytical and the operational benchmarks TPCH TPCBS and TPCC and we found that this database which supposedly does both was actually better than both of the pure play data warehouses that we analyzed so we did three different analysis two of them were different operational and analytical databases glued together as in you know something we see in the field quite a bit that was the one database solution one of these single store databases so we've also found that in TPCH it attained a geometric mean better than both of the pure play data warehouses in the TPCDS the analytic one of the analyte databases was superior but was it superior enough for the difference in performance that you're going to get the difference in manageability that you're going to get I don't know that's what you need to decide that's why all of these decisions that you have to make that you're going to be making in the next well for the rest of your career and my career they're all going to be sitting right there on the peak of the of the TPC where it can go one way or the other and that's where our decisions of government need to come into play so we need to keep our game up so that we can make these decisions right there are long decisions I mean there are decisions that are going to greatly limit your possibilities and we don't want to do that and we also looked at the costs here and we found that the single database costs were less so given the vast superiority in transactional processing and the high competitiveness in analytical processing the efficiencies of one database across the spectrum of enterprise needs should be considered in summary when it comes to translitical databases you've got options first of all recognize that you're in a situation where it is translitical applications are moving translitical it's a lines blur between operational analytics that hard line that hard Chinese wealth that I showed before may or may not be relevant anymore analytics are deeper than simple knowledge they have the debt the need for real-time analytics drives the need for a translitical architecture there are examples in every industry we went through I think nine examples here today traditional architectures do not meet the requirements there are multiple vendor multiple products, same vendor and single products options here and some of where you go with this appropriately may come from where you are today single product solutions combined row store and column store and I showed you the importance of the column store to analytical queries and my comment about don't forget the one database solutions out there and that leads me to the end of the formal part I welcome your Q&A as I turn it back over to Shen and while you look at the upcoming topics in advanced analytics William thank you so much for another great presentation just a reminder to everybody and to answer the most commonly asked questions I will be sending a follow-up email by end of day Monday for this webinar with links to the slides and the recording so diving in here William can you comment on tools like stream analytics at the edge or real-time series engine like is your data explorer comment on them yeah I mean we see a very strong move towards streaming in all forms of you know data integration streaming is the ultimate in terms of getting the data into the architecture ASAP and so we're definitely looking at streaming every place that requires data integration today just to be sure that it meets the needs of today and tomorrow so these stream solutions that were mentioned are great and valid you know what I just said about how where streaming is going I am floored sometimes at the popularity of streaming and where it's going so much harder and faster than probably any of us are considering out there I did go to the confluent conference a couple weeks ago and talked to a number of vendors at so many new entrants in that space and it's a great way to get information into the architecture the question mentioned edge we are all about edge computing and there's great debates going on I'll put that way in regards to what can be done at the edge and of course what can be done has to consider what data can be kept there what processing can be done there and the good news is that the I think it's good news the possibilities are increasing I think I talked about this a little bit last time that we can now do at the edge we can now consider a lot of analytics at the edge we may or may not be able to calculate all the data all the analytics and have all the data at our fingertips out at the edge but that's where you still got batch analytics as long as they're getting fed to the edge you got analytics going at the edge so there's a lot of riches that can be done there are a lot of possibilities perfect and that brings us right to the top of the hour again William thank you so much again for another great presentation and thanks to all of our attendees and just to again reminder I'll send the follow-up email by end of day monday with links to the slides and links to the recording thanks everybody I hope you all have a great day thanks William thank you bye bye