 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining this DataVersity webinar, The Death of the Star Schema, sponsored today by Encorda. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so and just to note the Zoom chat defaults to send to just the panelists, but you may absolutely switch that to network with everyone. If you have any other questions, we will be collecting them by the Q&A section. Or if you'd like to tweet, we encourage you to share your questions via Twitter using hashtag DataVersity. And to find the chat and the Q&A panels, you may click those icons in the bottom middle of your screen to activate those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. And as always, we will now be introduced to our speakers for today, Claudia Imhoff, Palavi Mishra, and Nick Jewel. Claudia is an internationally recognized expert on analytics, business intelligence, and corresponding architectures. She has co-authored five books and more than 150 articles for technical and business magazines. She is also the founder of the Boulder BI Brain Trust, a consortium of internationally recognized independent analysts and experts. Claudia is a sales engineer at Encorda. She has three years of professional consulting and project management experiences, strategizing and leading enterprise solution implementation, building prototypes and providing software solution demo to potential clients to drive sales, business process optimization and automation, robotic process automation, and IT risk management. Nick is a senior director of product marketing at Encorda. She previously held roles across product architecture and delivery in a number of startups and leading financial services firms, acting as a technology evangelist with analytics communities, data leaders, and wider public. And with that, I will give the floor to our speakers to get today's webinar started. Hello and welcome. Thank you. Nice to be nice to be a part of this event. Hi, and my thanks to DataVersity, my thanks to Encorda for inviting me to the party. It's an interesting title. I hope that we will explain it in thorough detail so that everybody will understand in terms of what we're talking about. So let me start with my first slide. It's the 70s through the 2000s. Yeah, the way back machine. So what's going on? Relational databases are king without a doubt. Good for highly structured data. We know that it's also the rise of the star schema as a design function as well as a physical implementation. It took away a lot of problems. We'll talk about that in a moment. Relational databases were simple. They were reliable. They had a lot of, you know, decades behind them. They were good for small to medium data sets. And now, of course, with the with their changes, they could go much larger than that as well. I love this quote from Michael Lask. You know, how much information is there? There may be a few thousand petabytes of information in the world. I love the last part of this typical information will never be looked at. Okay, not exactly true today, but let's move on. Let's go to the next slide of the rise of the Internet 1990s and the early 2000s. Wow. You know, I give Doug Lady credit for coming up with the term big data. I'm not sure it's as relevant today. Data is data. I think it just so happens that we have large, large volumes of it. We're really interested in almost everything we can possibly get our hands on, not just traditional data. But you know, we'll see. The extreme certainly has surged in our in our databases, the velocity we are every click every everything high velocity transactions, all of the digitalization and multi channels that we have. And of course, there is the variety structure data is still there. It's certainly very useful. But now we're starting to get into weird formats. It's invariably structured. I don't think it gets the unstructured at that point it becomes chaos, but you get the idea that the format of the data is not the nice neat tidy stuff of our operational environment. So we've got an awful lot of data that's come up. And, you know, we have to do something with it. So, let me go to the next slide and we'll, we'll take a look at what's what we're trying to do here. So we have for today, four things, the life of the star schema, and then of course the title of the event, the death of the star schema, and then the benefits of eliminating star schemas, and how we get started. Again, I hope you will enter your questions. I hope you will question me on things that you either don't like or don't want or whatever, you know, this is an open environment and I welcome your comments. Let's get going with the Genesis of the star schema. It's the 80s. We're talking Divo, YouTube, The Who, and maybe even Genesis in there and Phil Collins, all kinds of wonderful music. And it's also the Genesis of the star schema. It started out in the 80s with Ralph Kimball, brilliant man, came up with a great design to satisfy the bottlenecks, if you will, of relational databases. The data warehouse era begins in full force, contains integrated data that all coming from multiple sources, mostly operational in nature, mostly structured data. It's sole purpose was decision support to make analyses to get reports to understand what's happening in our, in our environment. In the traditional DVMSs were used all over the cod and date rules for data design formalization, for example, most efficient way to store the data, but unfortunately that that date and cod design that hyper normalization was pretty sucky when it came to multiple designs. If I wanted to look at sales in a particular store of a particular product at a particular date and time I had a massive set of joints that I had to do. And that's not something that the RDBMSs were particularly good at. So the star schema came into being a brilliant database design that preformed pre joined all these things together. It mirrors the business's requirements that query that I just asked, for example, could be designed into a star schema and the business community to ask many questions, right, as long as the domains were correct in the in the fact table. Then everything was hunky dory team and I can I could quickly ask partial key selects on anything get reasonable response. It was a brilliant design. No doubt about it. On the next slide. The star schema. We're now talking about the physical instantiation of that multi joint process, not the design, but the actual star schema itself significant data denormalization right we had our facts and dimension tables. It wasn't just like the star or in some cases the coronavirus, but it was basically a center table the fact table with all of the dimensions hanging off of it, and it gave us tremendous performance for multi dimensional analysis, as long as the analytical processes or the data didn't change. Now that's where the red content isn't it. Let's go to the next one. The difficulty start to develop when we have changing especially changing dimensions. Like I said as long as the analytical processes or data never change well that's not going to happen they do. They're unpredictable they're always changing their fluid, we add new dimensions we add new facts to the fact table and so forth. Analytical environments become somewhat nightmares of complexity, these star schemas if we had to add a dimension. Instead of tearing down the old star schema adding a new dimension and reinstalling it. Many times what we did is just simply create yet another star schema with the new dimension in it. Changing dimensional data is very difficult to deal with it always has been you know that I know that we've all had to deal with it. The maintenance tends to skyrocket. The need for new dimensions is constant we're always thinking of new ways to slice and dice with data. And the need for new, mostly redundant star schemas skyrocketed as well and believe me those sky those star schemas are not small amounts of data. There are a lot. And then of course, because it's somewhat slow it's somewhat difficult to revamp you don't want to tear down and rebuild you just rebuild or you just build on to it and so forth. We lose a lot of flexibility and we lose a lot of agility. Yes, and that's me in the middle tearing my hair out the business community, however, is not amused. So, let's talk about the depth of the star schema the next, the next slide here. There must be a better way. Yeah, there's my three legged stool so let's talk about what's going on. Who are for technological advances. We now have a lot of things that we can do we have cloud storage that's the first leg of the stool. Cloud storage is, it can expand and contract right if we don't need it, then we don't pay for it and that sort of thing. We have in memory which has been a tremendous boom to analytics in memory capabilities, second leg of the stool. And then the third one are the new query engine and boy oh boy there are a bunch of those and they have been a God sent to those of us, building these analytical environments. Okay, so let's go to this and start talking about that first leg column their storage in the cloud that is data stored in the cloud, like parquet for example. It does reduce the cost the elasticity of the cloud is greatly appreciated. It's a new and different way of thinking of data storage when I need it, I get it, but I don't need it. I don't want it. Data storage orchestration over different storage formats, for example the random access memory that I can store data in SSD solid state drives I can store data in HDD the spinning disk themselves and depending on how frequently I use the data how fast I need the performance. I'll pick some kind of storage mechanism. The optimism. There we go. The optimization that improves performance by better IO usage of query engines and column and in memory storage. So yay first leg of the stool is pretty solid. We like that. All right, next one. The second leg is in memory. It has, has plunged and that's what the slide, the cost of memory has plunged and that's a good thing historical cost of computer memory and storage. Oh boy, at the beginning it was outrageous and we didn't want anything at every just cost too darn much. But now it can actually data can actually reside in memory rather than on this. It optimizes performance tremendously doesn't it for queries by eliminating requests for this stored data is much faster. We do like that. It improves scalability with decreased cost of memory, so we can scale up we can scale out new and different ways of doing all kinds of analytics during the data and using the data. Secondly, then the last one, of course, the new query engines. Very exciting area right now. These are what makes our schemas irrelevant in many ways. Not the design, please don't get me wrong, but the actual physical instantiation of the star schema. Aggregations provide those real time joints and they are, they are incredibly fast between complex tables. It allows us to have virtual star schemas. They create needed aggregations at the same time if we need to store an aggregation stored in memory right. It builds a lot of flexibility in terms of the queries that can be resolved. Keep in mind that if we spool up a virtual star schema, it's easy to eat much easier to change it. And if I had a physical instantiation that I had to tear down rebuild restore, reload the data and so forth. We now can do things that were just not possible in the 80s and 90s and even the early on the star schemas are all about business design, not necessarily database design. Let's go to the next slide, and that's the death of the stars. I really wanted to title the death of the, you know, the Star Wars star, the death star but that didn't quite work. So let's talk about all three in place at the at one star schema is replaced easily data can be more quickly ingested and integrated, it makes the ETL process a little bit simpler by removing a lot of the star schema creation and maintenance from it. The data from a lot of complex data tables can now be quickly joined and presented it's not just for simple operational data standard data. For example, a fact join to a fact is almost impossible if you've tried to do that in your relational BDMS good luck with that one. It takes massive performance masses. IO speed and everything else and it just, you know, we would just build another star schema rather than try to merge two facts together. So when the improved performance as we discussed, we now can get rid of physical star schemas. Unfortunately, Nick is going to talk about what in court it does to help with that situation. I hopefully I hope this is this is giving you something to think about, and of course make some questions as well. But let's go on to the benefits. If I don't have to have physical star schemas. What are the benefits. So first of all, let's talk about a schema, a star schema less environment. What does that mean. That means I have a built for purpose environment based on users needs and their queries. I don't have one type of stool anymore. I can have all kinds of things. I can be very individualized someone can can school up their own individual star scheme for their particular needs. The industrial strength ones, the ones that are used by everybody okay these are very common star schemas. I can have reusability reusable star schemas as opposed to creating yet another star schema maybe I can add on to it, or if I have to build another one, I can easily do that without making it a physical one. So how important to our data scientists and the advanced business analysts, I can have experimental things. Let me see if this ever happened. Let me see if this. Oh, wow, yes, it does happen it happens quite often. And then of course the last one, artistic things that I hadn't even thought about somebody else will think of, and they can do their own bit of creativity there. I think the star schema less environment gives us all so much more flexibility than we've had in the past. The next slide on the benefits, like I mentioned the flexibility agility, the ability to do whatever I want whenever I want returns to the data warehouse environment. The users can ask impromptu questions. It isn't based off of a hard coded star schema physical star schema. I could ask unlimited dimensionality questions. They can use much more complex and detailed data. Yes, all while receiving very good reasonable response time. So how about the maintenance, those of you like me that has a struggle with maintaining physical star schemas, you know how hard it is to keep them up to date and maintained and so forth. So it's kind of an interesting new world here design sessions are reduced because we don't have to think of everything all at once, hope to God this one star schema will fall out all problems. And ETL gets simplified. And of course, maintenance is reduced. All good things, all things we want. The last slide on the benefits data storage requirements are reduced star schemas can get to be very large. A fact table can be six, seven, eight different dimensions just do the math on that and you can see how big it can become. So data storage if we don't have that data storage gets reduced columnar storage compress the data. Wow, that works. There are no indexes. That's something that is absolutely needed. Isn't that a wonderful change in this environment. Developers are freed up to do much more valuable services and maintaining star schemas. They can focus on increased availability, for example, they can focus on volumes of new and different sources of data. Always something new and different that they can do they're much more excited about this as well. They get to new, they get to do new and different things maybe focus on things that are more critically needed like how is the quality of the data. What are the data around the data? Do we have a data catalog that we can now implement because we have time to do that. Re-evaluating the star schemas can also uncover a lot of before unknown errors, things that we've been relying on that perhaps weren't quite right. Well, it gets very complicated. The creation of star schema gets very complicated and sometimes we get things a little crossed a little bit wrong, if you will. All right, let me finish up my piece of this and then I'll turn it over to Nick. I want to focus on some of the getting started pieces. If you've got a traditional data warehouse, good for you. Part legacy warehouses. If so, then here are some steps to use in migrating to a star schema list environment. Don't just forkless stuff over into the cloud. Please don't do that. This is a chance to brush the dust off look at what you've got figure out what you need what you can improve. And more importantly, what you can get rid of. So that's a really important first step. Evaluate your ETL processes. Understand what's going on in your ETL processes. A lot of things have been added in unnecessarily or workarounds, or simply because it was expedient. We could do things faster. Where are the star schema bottlenecks? Where do you have problems? Where are they slowly changing dimensions just killing you? Decide which of the star schemas are particularly burdensome. You know who they are in terms of their creation and their maintenance and target these for migration first. Can we get rid of the physical instantiation? Can we lighten up the ETL load? Can we make it much more flexible just by simply understanding which ones are troublesome and redoing them? Let's go to number two. Group selected star schemas by business problems they solve. We know how to do this. Here are the ones that are all about, oh, I don't know, customers. Here are the ones about our stores. Here are the ones about products, and so forth, the customers and all that kind of stuff. If we kind of group them together, then we can prioritize those business stars, those problem business stars, as to their criticality, maintenance difficulty, and requests for updates. Don't lose the request for updates because they'll become critical too. Each grouping then can become its own project. And we can start to rebuild the star schemas in a virtual fashion. It can become accessible very quickly and it gives you a very clear path forward. So group them all together, make sure you understand what's going on with them. Analyze them for things that aren't needed anymore and then start there. Hopefully that'll give you some idea of where to go in terms of the steps that you need to go forward. So then begin analyzing the detailed data from which the star schema was developed. Star schemas tend to be aggregated. They tend to have aggregated data in them. So can we look at the detailed data that went into the aggregate? The data can add even more flexibility and agility. If we go back to that detailed data, you know, a lot of things get glossed over when the data gets aggregated. So maybe if we can look at the detailed data in a little bit pun intended detail, then maybe we can understand things that are going on a little bit faster. You may discover errors. I have as soon as we start going through the ETL processes, you will find that there are things that are wrong. It's also a quick win for developers and business users. They get a new shiny environment that works as well as the old one is not better and they have more flexibility. So life is good. Okay, let's go to number four then choosing your migration path. Move the set of star schema data for each business problem into the new environment, according to the priorities scheduled itself. That's a quick win. That's an easy thing to do. Get the data into the, yes, and the detail might be better than the aggregated data. I'm not saying the raw data because it has to go through the ETL processes themselves, right? We do have to clean it up, fix it up, whatever it is, and then pop it into our database. But in this case, we stopped there. We may not have to go to the actual star schema design. We'll make those virtual. And then number five, expanding the data acquisition horizon. Okay, now that we've got this more structured data under control. Maybe there's data that you thought would be on your, your capabilities, data that was outside of the operational four walls. But now maybe you can bring that data in. But data volumes query performance time to deliver are not as big a problem as they were in the past. So now we've got something to look forward to new shiny environment. All right, moving on. Number six, life is good. Reducing the burden of the star schema design, creation and maintenance does free up time for developers. And that's a good thing. They can, they can focus on many of the different things now, maybe new sets of data, maybe reducing the backlog of analytical requests, maybe helping the data scientists now find the data that they need and so forth. Always a good thing to free up our developers to be more creative, more productive. The seventh one is, you know, not too many of us are in the seventh situation, but if you are. If you're in a green field, you're starting over, you're not forklifting a data warehouse, you just simply starting over again. Well, maybe you can understand the business users needs from a fresh start, go beyond those needs and embellish, because you've got the flexibility that we didn't have 1020 years ago. You still need to determine how much ETL and data quality processing is needed. What level of detail you need to go to, how it's going to be documented, and so forth and so on. And this is where it's going to talk about a new approach to analytics in the next, next session. So by all means, think about it. If you can start from a green field, lucky you, if you can't, then think about the six steps that I just gave you. All right, let me go to my last slide and then I'll turn it over to Nick. Advances and analytical technologies. They're terrific. Wow, they're wonderful. Time to study them time to rethink how we build something time to understand what these new technologies can do for us. You need the scar, the star schema design phase. Don't get me wrong. Star schemas are still critically important in terms of their design. That is still a mandatory step to understand what the business users need and want. And yes, you do still need a repository of analytical data. It might be at a lower level of detail. It might be in a different format. Hopefully it is not in the star schema. And yes, you do still need ETL or some form of data integration and of course quality processing. But hopefully you'll need less of it if we don't have to maintain the star schemas of physical star schemas as well. Finally, you do still need to perform maintenance on the stored data. But again, there's less of it. The new technologies don't require indexes, for example, they are simpler data schemas as well. So keep an eye on all these things and solve many of the past very difficult problems that we've known for decades now by bringing in better, faster or flexible decision making into your organization. This has helped a little bit. I hope you understand where I'm coming from and what I'm talking about. It's certainly a different world out there and I think it's time for all of us to take a look at what new technologies can do for us. And speaking of a new technology in quarter is certainly one of them. And I'm now going to turn it over to Nick, and he can talk about specifically the in quarter situation. Thank you very much Claudia and thank you to all of you in the chat room today I'm seeing some passionate conversation. If you have any questions that you'd like to put to Claudia or any of the speakers today, please use the Q&A window and we'll try and answer them at the end of the session. Hello, let me introduce myself quickly here my name is Nick Jewel, I'm new to in quarter, but I've been in the analytics space for a long time I've definitely got a story or two to tell around star schemas. And I remember when I was working in financial services we built a pretty cutting edge environment for modeling financial exposures, and this was probably just after the financial crisis, lots of interest in trying to use advanced analytics to understand the risk associated with loans, particularly across different sectors and different customer profiles. And I was in charge of the analytics delivery, we used star schema designs to serve up the means to analyze this portfolio, all the way down to individual corporate assets and different assets. And I think in the end we had around 15 major fact tables, covering every aspect of the bank's exposures for the lowest level, all the way to monthly aggregates and surrounding this, probably between 40 and 60 dimensions that gave you all the context for the actual business, and in traditional star schema design, you plot each dimension and how it gets used in these matrices like you see here on the right hand side of the screen. Basically, a lot of work, probably 24 months from start to finish with a team of 10 to 15 data engineers, business analysts system testers project managers to deliver this from the underlying source systems into a data warehouse, and then into star schema data marts. And over time, the bank built in extra layers into the process with data lakes being used to stage the source data, lots of work to get it into the format the warehouse needed, and then onwards into those analytic layers and it's pretty common to find organizations with so called modern data architectures that follow a similar process they extract source data into these raw landing zones they start to refine the process with enterprise level data warehouses, and then produce subject matter aligned marts using the star schema approach that we've been talking about. And it's when you take this 1000 foot approach, you start to see the cracks showing up. So, we've reached this state where multiple layers get developed, each with their own cottage industry of supporting tools and technologies, armies of data engineers and bi developers, working to move the data into different shapes and different forms, each layer resulting in a significant loss in data integrity data engineering decisions kind of stripped away upwards of 90% of the data. And we end up with the creation of multiple copies of data in silos across the organization. So, how about simplifying the whole process. What if you could take the business application data as is ingest it, enrich it, deliver it so that users can work with multiple levels of that data from fundamental transactional levels upwards as quickly as possible. So this means you get access to operational analytics at that most granular level in the same platform as your typical sales or marketing analytics, you know, they might be slicing up to a level product category location or other features. So again, by taking the data as is landing it in the cloud, we're making everything available for analysis, not just a final 10% of the data, the whole data set, and effectively future processing a data process for end users so whatever questions get thrown at this data, everything from the underlying business applications is present at the most fundamental level without the need to step back, write new scripts, load or move data. And here, this is where we get to the real magic. You start with the data in its raw source system format. If you're using data lake terminology, you might call this bronze data, but definitely without requiring all that engineering effort to build the star schema upfront. So your analysts start engaging with the data directly using self service analytics directly in a browser, or your favorite BI tools so that they become instantly productive. The platform here becomes a simple facade to those end user tools gives you all the power, all the performance, but doesn't require someone to learn a new interface or a new way of working. And let's talk about the transformation that this can bring even just to the way that data gets made available to end users. So as analysts, the benefits are twofold. Firstly, questions get answered faster, and we'll see a little bit of how this is achieved through the data architecture in a moment. Secondly, new questions can be turned around significantly faster than legacy approaches. So business queries that once took weeks now get to be answered in minutes because that whole data set is ready to be analyzed. So we get flexibility for the business. We get considerably lower maintenance costs for the data and the it teams who manage and support that back end. And frankly, it's incredible what changes when you get to reconsider those three legs from the stool that Claudia just talked about. So from an in quarter perspective, here's how we take each aspect from those three legs and build a modern architecture that underpins this dramatic uptick in performance. So down at the bottom is a cloud storage layer. This is where we're going to be able to scale both storage and compute very efficiently over to the left is our suite of connectors that enable access and ingestion from all your business application sources. So your net suites your SAPs your Salesforce instances, and we land this into the cloud in that park a data format so an open standard that effectively allows us to use that cloud based file system as a database in terms of storage and access. Now as Claudia has already mentioned park a an open data storage format has really emerged as a critical part of modern data architectures both in terms of capability. So it's a column of format, very efficient at working and compressing large data sets, as well as its flexibility in how the data gets to be consumed as part of a wider analytics platform. So, for example, the ability to use spark for large scale advanced analytics machine learning, or really just simply replacing and migrating legacy application functions from languages like PL SQL over to Python using PI spark. Now, another major benefit of this architecture is that it essentially does away with that entire data pipeline that ETL industry that ends up consuming so many traditional data projects and ultimately only really exists so that analysts can run queries on top of summarized tables. So how is this achieved well, our architecture includes a metadata layer above the data storage that maps the relationships between all the business related data tables. Basically, all the joins that would usually be needed to connect the little islands of data in the main business source systems. What's great about this direct data map is that for the end user, they can traverse they can explore their data, as if it were flat, without having to do the complex data prep that's usually needed so you get the same performance as if you'd flatten the data, but without the days and the weeks of behind the scenes movement of the data. Since we're talking a lot about the evolution of data architecture here. Let me put out a definition from the data management body of knowledge or dm book. So data architecture defines the blueprint for managing data assets by aligning with organizational strategy so when you need to work with business applications that do have extremely complex underlying data models. So we're talking here about ERP systems maybe from SAP or the structure of systems like Oracle's e business suite. We're talking potentially about over 50,000 tables, and all the related joins between them. And if you wanted to pull up an invoice from just one of those systems under the covers that could be 30 to 40 tables that feed the invoice so you have elements like shipping address or build two details. They all come from lots of smaller individual database tables in the back end, and they're all required to be able to produce this clean looking final invoice. Now, even if that data had been converted into a star schema, you still have multiple joins between an invoice fact, and all the surrounding dimensions that feed the structural details. And again, not to mention a healthy industry of ETL developers, database administrators and BI report developers, burning time, burning money to get you those results. So we've worked really hard to demystify all of that complexity in the form of in quarter blueprints, which basically zero in on the raw tables that are needed from the source system, along with helper tables associated business logic, and especially the understanding of how all these tables are actually connected together so that teams can be confident immediately that they're getting the data needed for analysis and operational reporting. And along the way, the data is enriched, converting pretty dry operational status codes in ERP systems into more meaningful attributes, all designed for end user consumption. So users can start with a predefined data visualization that's also included in the blueprint, but then they get to drill into all the details. And the concept of a blueprint in this data architecture really jump starts the process of moving from raw data to insights in a short time frame as possible you install a blueprint, you're working with that transactional data end to end in the same day. I mean, the patterns of data architecture changed dramatically over the decades, driven by fundamental changes in technology, driven by the appetites of users to access and consume ever more detailed views of their business, and an ever growing demand for analytical data. We've shown how the fundamental components of an architecture like this, so the ability to ingest raw data from business systems the need to enrich and adapt data, the need to curate and present critical elements, and then those twin forces of analysis and collaboration have really all undergone huge change because of these forces. And here, I think we've presented a modern data architecture that can be delivered in a single platform, ensuring velocity as well as agility through this end to end process So in these few slides, I've given you just an introduction to the way that in quarter recognizes the challenges with star schemas and in fact, the challenges of much of the way that traditional analytics moves data through pipelines and actually limits what the analyst can work with in their roles. I think we've also seen that by taking each of the three legs from Claudia's stool, these forces of column data in memory storage, new query engines, we can actually build platforms that completely flip the table on older designs and older architectures. At this point, I'd love to introduce Palovi Mishra to take the reins for a short demo that's really going to bring some of these concepts to life using the in quarter cloud platform Palovi over to you. Thank you Nick. Hello everyone, I'm really excited to go ahead and show you guys what's the power of in quarter platform. So I see everybody is really excited to see how and what we can do with this platform. So as you can see on my screen right now in quarter is this modern web based product that can be used from any web browser that you're using. Once I log in, you come on to this content tab, which gives you access to all the different dashboards and folders that you might have built for your use. Now, for the purpose of today's demo, we are going to see how we can ingest large volumes of data and work with it with an in quarter. I have moved over to my schema tab, and you can think of schema as something similar to what schema means in databases, which is just a different collection of several tables and the relationships. So I'm going to go ahead and use my accounts receivable schema for this demo. Now this accounts receivable schema consists of several tables coming out of our ERP system, and it has all the ER related data in it. But one thing I'd like to highlight over here is if you see at look at my accounts receivable schema, it consists of 2 billion rows of records 14 different tables and 31 joints amongst all these different tables. Now, the thing to highlight is this 2 billion rows of records. That's a pretty massive volume of data that we are referring to over here, and we would be able to work with this on the fly. To highlight what I mean, let's have a look at the ER diagram for this particular schema. Now an ERP system can consist of 100 of different tables and those hundreds of tables can have say thousands of joints between them. And the ERP system has this intelligent built in itself, which brings this joints as is on to encoders platform. Now, as you can see on my screen, we have not done any kind of transformation or built a star schema, or any kind of modification to the data. Data has been brought on to encoders platform as is, as it would, as it would exist within the source systems. So clearly maintaining that full data fidelity on encoders platform. Now, not only this accounts receivable data is tied out to the different tables within ERP system. You can see that we have also linked this to Salesforce data. And some customer churn ML models that have been built using encoder. This is done in order to give you that full flexibility to go ahead and do cross system reporting by linking data from various systems, bringing it all on one platform. If I go out from here, you can see this customer transaction lines table, which is coming from my ERP system consists of 471 million of rows of record. That's a pretty massive volume of data that we are looking at, and it has all been mirrored on encoders platform. Looking at the complexity and the volume of the data. Let's now put up put on our analysts hat and see how we can use this data to answer some of the business questions that you might have. And that all can be done from the schema tab, just by going to explore data. Now I have come from my backend data ingestion part to an encoder analyzer, which will give me the access to go ahead and build certain insights and make sense of my data. But keeping time in mind, let me actually go ahead and bring this over here. Now, if you look at my data panel over here, I have brought in all the different tables that exist within my accounts receivable schema on to this one particular encoder analyzer screen. And to begin with, I just want to build a simple insight, which will give me a detailed view of my customer transaction lines table. Now, if you remember this was the largest table that we saw in accounts receivable schema, and it consists of 470 470 million rows of records. Just to get a total over here. Now this total will help me to see a nice summary of all the different measures that exist in table at the bottom. I'm going to call this table as my details table. And let's go ahead and save it. I'm going to save it on my dashboard. And let's name it air transaction. And right off the bat, you will see that we have moved from the back end of seeing how we have so much data in the back end to a nice visualization on the front end that is giving me a sneak peek into the large volume of data that we have. And that's to save out over here. And now we have our air transaction dashboard ready, which is giving me a look of all the records that I have in my customer transaction lines table. But we understand that at the front end when you are dealing with business you might have to answer certain business specific questions. And to do so let me go ahead and add another insight to this dashboard. And this time I want to go ahead and build a pivot table. Now why we chose pivot table is because pivot table happens to be computational intensive, and most of the reporting tools struggle when they're dealing with a large volume of data, and have to do certain aggregation on top of that. So for this particular insight, I want to analyze my revenue, as well as my quantity ordered, and let's slice it by my sales channel and party type. The concept of party type is similar to the concept of customer type and ERP systems. And let me add a dimensionality of month name to it. So now you can see that how in a matter of few seconds I have gone from that huge volume of data at the back end to building a really computational heavy pivot table on the front end. So if you are dealing with such huge volumes of data, you can see that we might have some dirty data or garbage data that doesn't make sense for my business. We can go ahead and filter it out to only show me the data where my month is not null. And just to make this a little pleasing to my eyes I'm going to sort it down by month number. This pivot table is ready, which is giving me a summary of my revenue amount and quantity ordered for different sales channel and party type. And just to talk about the complexity that we are dealing over here. If I check my query plan, you can see to render this one particular insight on the front end. I'm pulling data from five to six different tables and multiple joins and more than 400 or like more than half a billion rows of records. But all this is done in real time. All this calculation is done in real time, because we have not stored the summary data or this aggregated data in some kind of a table at the back end. We are working with the raw data coming out of the source system. So really giving you that sub second query performance, even while working with computational heavy things like a pivot table on the front end. I'm going to go ahead and save it to my dashboard. You also have the ability to interact with the dashboard on the front end. So let me go ahead and arrange my dashboard in a way that I like. So having a nice summary on the top, and a details table at the bottom. And before we close it out, let me go ahead and add one more insight to it, in which I would like to build an advanced map where I like to realize my revenue and get it by different countries. Now, you can see that within within a few seconds I was able to see how my business is earning revenue in different countries. If I want to add another layer to it, I can go ahead and also add my revenue amount and visualize it by the different states that exist. So if I move my state over here, and to mix things up a little bit. I'm going to change this visualization type to a heat map. So we can understand that United Nations is earning the most amount of revenue. If I go ahead you can see how my different states are helping me realize revenue within United Nations, telling me that I am earning more on the east coast as compared to the west coast and so on. So let me name this as maps. And we'll go ahead and save it out to my dashboard. Now that we have a high level representation of my revenue by different countries and states, and also by sales channel. One of the powerful thing that in quarter offers you is that ability to interact with your dashboard on the fly. So suppose you're interested in knowing how your business is doing in the month of February for your office stores. You have the ability to drill down from that high level summary to a very specific view and filter down all your all your insights by just one click on your dashboard. Now I'm getting a view that I'm earning a revenue of 45 millions for my office stores in the state of in United States. And if I go ahead you can see there are particularly three states in United States that is helping me realize this revenue. And the really cool thing about this over here is, if I scroll down to my transaction details table. Remember which consisted of half a billion rows of record. I have filtered down from half a billion rows of records to only 2000 rows of records, and these 2000 draw transaction lines are those transactions that are actually building up this revenue of 45 million for the sales channel. If you go a little to the left, you can see that my revenue column is tying out to the single penny that has been that has been brought in my business to give me this revenue of 45 million for my office stores in my party type organization. Just to conclude, what we saw over here through this particular demo is not only the efficiency and performance of in court, while we are dealing with massive volumes of data coming out of complex source systems, having millions of joint, but also the simplicity and ease with which an end user or a data analyst can build dashboards do reporting on top of billions of rows of record or cross system data and build dashboards. Another cool thing about in court, which I'd like to highlight is this ability to drill down from a very high level summary, say, say a map visualization or a pivot table visualization to that single penny level transaction which helps you to tie out and perform your reconciliation or data validation, as in what you want. Now, all this was made possible within one single platform by in court, because we are not building any kind of star schema or modifying the data at the source, but because we are bringing the data as is from the source to maintain its full fidelity of the source system transactional data. With that, I'd like to close out this demo Shannon over to you. Thank you all so much for this fantastic presentation. I love it. I love all the chat going on lots of passionate, as Nick mentioned, passionate comments going on. And to answer the most commonly asked questions this reminder I will send a follow up email by end of day Thursday for this webinar with links to the slides and links to the recording of the session along with anything else requested. So let me dive into the questions here so eliminating star schema. How do you handle consistent and quality reporting environment, just with raw data from different sources. So I'll happily take that one first of all Shannon so when it when it comes to actually making sure you've got quality data from all of these different places and in the demo we just saw coming from an EBS environment but obviously you can have many more environments joining together. I think first of all, it's the visibility over that data when you have no values, you can see them immediately presented on the screen. So first of all for an analyst, that's the best approach you get to see and get to be transparent with your data sets behind the scenes, obviously as Pallavi work through her demo. It's entirely possible to enrich and augment that data so we can track null values we can capture null values, we can map against master data to actually fix the values at that stage but the key here I think is transparency across all of that data. Anything else you think you should add on that one. No, I think you were right on point, Nick. I love it cloudy anything you want to add. No, I think you guys have covered it. Awesome. So, where did the hundred percent to 25% to 100 to 10% figures come from I don't see especially the 75% loss from source to integrated data. The headline catcher for you there Monty if I'm reading your question right but I want to say it's based on the reality that you start with 100% of your data in all these source systems and the the bi teams the ETL teams the project management teams, make decisions along the way that affects the fidelity of the data, you do make assumptions going going back to my time building star schemas in financial services. We had to aggregate the data to a monthly snapshot, because the technology, the time it would have taken to bring through granular data just wasn't possible to meet the business needs so it's very common to have to make those filtering decisions selection decisions aggregation decisions, and you quickly end up providing the business a solution that's a fraction of the potential data that's available. Fantastic. So, how do you handle access to specific areas based on data governance needs can you speak more about that and provide examples. If it's okay I think Pallavi as somebody who's a practitioner within quarter could probably best answer that one what do you think Pallavi. So, when you talk about data governance data governance can be put into a lot of different buckets. Now because you are bringing your data from the source system in quarter gives you that ability to perform data security whereas you can you can see with which user you want to share which kind of data. You have the ability to lock down or have security implemented at the record level, so that your data is completely secured, and you only have access and giving the right kind of access to the different people who need the right kind of access to work with, and not exposing everything to everyone in the in the in your entire organization. If you're talking about the data quality governance that we have you certainly have as Nick mentioned the ability to go ahead and filter out your data based on the right data, getting changing the data types for the data checking whether the data has null values or not. So building a lot of checkpoints at every step from ingestion to analysis you can put data governance guardrails around your data, while you're using it and working with it. So we do not rely on source system security within and caught us platform itself we have built enterprise level security, which means that you can have you can secure from a high level access point to the very lowest level of record level security. You, when you bring data over in and build a schema you have that functionality available to you where you can secure your data as to who should have access who should not. All right, so how are we performing data quality validation on the ingested data from the source. So I think Shannon that's probably similar to the very first answer that we gave so the data comes in its full transparency obviously accommodating any security or visibility rules that you might have. We import those from source systems if needed, we can build them inside the platform as well, but it comes down to transparency it comes down to as Pallavi showed in her demo you can start off with high level aggregates and start to ask questions drilling down all the way to those individual transactions, which would be incredibly complex to unwind. If you are using a legacy or traditional bi product, because as you saw 2 billion rows across certain tables joined across 30 or 40 tables for some of the more complex queries so again it's transparency. I love it. Lots of great questions coming in but I'm afraid I'm going to try and slip in one more here. Is the concept similar to data have data virtualize or excuse me is there a proposed model like data vault proposed for underlying raw data on top of which virtual star schema can be built. Oh that's a brilliant last question thank you I was hoping Claudia would have got some questions but don't worry next time Claudia. The, the idea that data vault is a data modeling structure is absolutely compatible within quarter at the end of the day. You saw the architecture we bring the data in we hosted in the cloud as parquet format. We do direct data mapping to ensure that analysts get a flat view a very high performing view of that data, and everyone gets to ask questions faster so data vault as a data model structure perfectly compatible. I think the absolute sweet spot is really unwinding the complexity of business systems like Oracle ERP SAP net suite and so forth. I love it. Well, Claudia I you know I think he saw in the chat to that you have a lot of fans out there who are here for you. This is great thank you so much everyone. You know, it's such a great topic. It's really infused a lot of discussion, a lot of thought. And again, just a reminder to everybody I will send a follow up email by entity Thursday with links to the slides and links to the recording. Thanks to all of our speakers thanks to and quarter for sponsoring today and thanks to all of our attendees for being so engaged in everything we do we always appreciate it. I hope you're ready. I hope you all have a fantastic day. Thanks everyone.