 Here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We would like to thank you for joining this Data Diversity webinar, How to Use the Semantic Layer on Big Data to Drive AI and BI Impact, sponsored today by AtScale. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag Data Diversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, Zoom defaults the chat to send to just the panelists, but you may absolutely change it to network with everyone. To find the Q&A or the chat panels, you may click those icons found in the bottom middle of your screen. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dave Mariani. Dave is the founder of AtScale and is the Chief Strategy Officer. Prior to AtScale, he was VP of Engineering at Cloud and Yahoo, where he built the world's largest multi-dimensional cube for BI on Hadoop. Mariani, Dave is a big data visionary and serial entrepreneur. We love it. Dave, hello and welcome. Thanks, Shannon, and thanks everybody for joining us. And today, I'll take it away. Thanks, Shannon. So today, we're going to be talking about semantic layers. And stay with it because I'm going to show you a live demo of what a semantic layer actually looks like and how it works, not just theoretical. So it won't be just slides today. So please ask any questions you like in the chat and we'll address them at the end of the presentation. So first, let's just talk a little bit about what a semantic layer is. If you google semantic layer, you're going to come up with this pretty good definition, actually. It's from Wikipedia. And what I like about this definition is a couple things. And you can see it's highlighted in bold. Not my highlights. This is Google's highlights. I like the fact that they're calling a semantic layer a business representation. So it's a business representation of corporate data. And I also like the fact that it says it helps end users. And I like that word also autonomously. And I also like common business terms. So what does a semantic layer do in short? Well, it's a business representation of your data. And it's using standardized, consistent, and user friendly business friendly terms. So that's very different from traditionally where and actually today in our cloud based data platform world, where we really force users to become data engineers to get access to data to make data driven decisions. And we think that that's just not right. We think that everybody should be able to make data driven decisions. And in order to do that, we need that single control plane, that semantic layer that will allow you using any tool you want. So whether it be a BI tool, whether it be an AI or ML platform, or whether it be a API through an application, you should be able to get access to that same business friendly view, regardless of where the data sits. So whether it's in a cloud data platform, like a snowflake, whether it's in a data lake, like S3, whether it's in an on-prem data warehouse, like Teradata Oracle, it shouldn't matter. So what a semantic layer does, it provides that logical layer, which sits on top of the data, which means anybody can access it with the tools they already know, like Excel or Tableau or Power BI, you'll see them all today. And allows the enterprise to make sure that they control who gets to see what data. So governance is built into the semantic layer, as well as that business friendly view, which drives consistency. So revenue is revenue and gross margin is gross margin. And because all that data is no longer bound by data pipelines, you get the agility and the speed of being able to introduce new data platforms without having to create expensive pipelines or train users on how to access that data. So what does a semantic layer look like in real life? You're going to see this in today's demo, but first of all, it's going to look and feel just like the English language, just like the business language. So you're going to see things like order quantity and sales amount. You're going to see things organized in measures and dimensions and hierarchies. And it's going to be accessible. You're not going to be seeing as a consumer of the semantic layer, you're not going to have to deal with tables and joins and data warehouse or database connections. That's really left for the semantic model. So this semantic model defined on top of that raw data gives live access to tools like Tableau, where this is how the semantic layer looks or Excel. And we're talking about live pivot tables and cell level functions. And this is what the semantic layer looks like or in Power BI, where this is what the semantic layer looks like. So all the same common metrics, same definitions, regardless of how you consume it. And this is just not in BI tools. You get the same power through Python if you're accessing it through a Jupyter notebook or an AutoML platform, or if you want to program it using ODBC, JDBC or XMLA. So what are the components of a semantic layer? So what does it do? Well, first of all, starting with the top and the bottom, yeah, you got to be able to connect live to it. So the key part of a good semantic layer is that you have a live connection to your data. So we're not talking about data extracts, we're not talking about cubes and physical aggregations of those of that data. Doing so means that you're going to have latency and you're going to also be looking at a subset of data. So a good, a proper semantic layer is going to connect you live to all the data that's been exposed through a semantic model. So the key here is that your subject matter experts who understand the data are able to create those models. And it could be in a hub and spoke type of model where you have a subject matter expert, the domain expert who's responsible for creating that semantic model. And then everybody else in the organization can consume it. So it's a really nicely compatible with this new hub and spoke analytic style. Or what a lot of people are talking about with data mesh, where you can have a decentralized access of your analytics and data. It's all powered and should be powered by a multidimensional engine. And that multidimensional engine should be virtualized. So what do we mean by that? Well, if you think about a multidimensional engine, it's really powerful because it gives you cell level access for computation. That means you can do crazy complicated business calculations that are required by the business to do things like not just year to date, that's easy, but moving averages and parent versus child, all the kinds of calculations that really are required by the business, but very hard to do in SQL alone. A good semantic layer also needs to be not just as fast as the underlying data platform it's talking to. It's got to be faster. And it's got to be faster, meaning it's got to be speed of thought, meaning queries come back in two seconds or less. Why? Because if it's not that fast, people are going to create data extracts like Tableau Hyper extracts or do Power BI premium imports. And that's going to create another copy of the data and another opportunity for data to drift. Drift when it comes to time, drift when it comes to semantics. So a good semantic layer needs to deliver instantaneous performance. And I'm going to show you how we do that with the at-scale semantic layer later on in this demonstration. And then we talked about governance and being that control plane. It means that we got to make sure that that data is accessible to only people who need to see it. So if I'm the East, then I can't see the West, for example. That's an example of row-level security. If I'm in HR, I get to see the full social security number. If I'm in finance, I can see it masked. If I'm in finance and I'm an insider, I get to see the revenue fields. If I'm in marketing, I can't see the revenue fields. We call that data object security. And it's also got to be tied to the user. So when somebody logs into Tableau or Excel or that Jupyter notebook as themselves, they need to be the person running the queries on the data platform. So then finally, on the connecting to data, you've got to be able to connect to data anywhere, wherever it lays. And you can't move data. So a good semantic layer leaves the data where it landed, whether it's in a cloud data warehouse, whether it's in a lake house, whether it's in a data lake or whether it's in a traditional data platform. So how do some people use semantic layers? Where are some of the use cases? Well, there's a lot of people who are moving to the cloud. And when they move to the cloud, they want that instant performance that they used to have when they did have a cubing architecture, for example. Or they do want the ability for when data gets to the cloud, it gets big. You get a lot of users using it. And it's not fast enough to do that speed of thought analytics. So cloud analytics optimization, both from a speed perspective, as well as a cost management perspective, because we all know the stories of moving to the cloud and getting that first cloud bill and the wheels falling off. So a semantic layer really helps you save cost, as well as provide speed. It also is a great enterprise metric store. So becoming that single source of truth. So everybody is speaking the same language and using the same business terms. We also can use the semantic layer to bridge your data science and your business teams. So your data scientists can be generating new predictions and new features and write those back to the semantic layer while your business and application teams can consume them and provide feedback and also use those predictions to compare against historical results. The semantic layer can be that fabric that joins those two teams together, because those teams are siloed today. Just like the data is siloed, the teams are siloed, people are siloed. So the semantic layer is going to bring together data and technology as well as people. And if you're looking to modernize your OLAP, you like measures and dimensions, you like that speed of thought queries, you like the power of the semantic layer, you're using tools like SQL Server Analysis Services or Cognos or Business Objects or even MicroStrategy, you can use a semantic layer to replace those technologies with modern cloud infrastructure. So this is where at scale, this is where a semantic layer would fit in the stack. And I have here at scale semantic layer. So where it sits is in between your consumption and your data storage and data platforms. So on your consumption side, you've got business analysts using BI tools, you've got data scientists who are using Jupyter notebooks and AutoML platforms, you've got application developers who are writing code and want to embed analytics in their applications. So the semantic layer becomes that logical layer where every query hits the semantic layer, which means we can govern access and provide query acceleration and provide consistent metrics across the spectrum. And your data could be in a data warehouse and a cloud data warehouse like a snowflake or a Redshift or Azure Synapse or BigQuery. It could be in a data lake where we access it through Spark or Databricks or Presto. Or it could be in SaaS applications like Salesforce and ServiceNow. Regardless of where your data is, you'll be able to expose it and your users and your consumers don't need to understand where that data is coming from. It's all transparent for them, which means if you want to move data around in behind the scenes, you can do that with zero effect for your data consumers. And the data catalog is really key. So the semantic layer needs to be have bi-directional communication with the data catalog. So it can share its models and semantics with the data catalog. So then that's those semantic layers are findable and searchable. And the data catalog should be able to define where those business terms and those naming conventions come from. So the modular can use those conventions when they're creating those semantic models. So without further ado, what I'm going to do now is I'm going to switch gears and I'm going to actually give you a demo of what a semantic layer looks like. And it's going to be in two steps. The first thing I'm going to do is I'm going to show you how we create a semantic model on top of your data. Today I'll be using snowflake data. And I'm going to show you then how we can create that model and publish it. And then I'm going to show you how we can consume it in these different platforms. I'm going to show you today Tableau. I'll show you Excel and Power BI. And I'll also show you in a Jupyter notebook. Okay, so let's go. So let's start first with with the model. So what you're seeing here, this happens to be at Scale Design Center, but what you're seeing here before you is a data model. This is a data model that's made up of some, you can see the green boxes are dimensions. We can see we have a fact table that is in blue here. But each of these are all different models, the green and the blue that are created and joined together using what we call relationships. So what you see on the right over here is you see this happens to be an internet sales model. And it happens to contain different metrics, like you saw in my preview, as well as different dimensions. You can see I have a customer dimension and you can see I have different hierarchies for where those customers live, for example. So the key to making a semantic layer usable is that you're using these business terms and you're also, you're doing it dimensionally, meaning you have hierarchies and rollups, so you can do drill downs and do quick aggregations in your tools without having to write any code. Okay, so one other aspect. I mentioned that these were themselves models. You can see this customer dimension. If I just double click it, I'm going to drill down into the customer dimension model. And so what you see here in the customer dimension model is that it's made up itself of two other models. So you have a fact table or a dimension table for customers. And you can see I have my customer name and some attributes about where they live, as well as their last and first name. And you can see I also have a geography dimension. And that geography dimension is also made up of different tables and models. So you can see I can easily create some pretty sophisticated models here without having to reinvent the wheel. So I'm going to show you how on my main canvas, I'm going to create a new model. And I'm going to recreate this model before you on live Snowflake data. So let's do that now. I'm going to go ahead and create a new model. And I'm going to call it Dataversity. And by doing so, I now have a new model and it's called Dataversity. And I'm going to enter into my canvas. So now you see Dataversity. So what we're doing here is we're creating the semantic model. So this is what I'm a subject matter expert right now. And that's the role I'm playing. And as a subject matter expert, I'm going to create a semantic model on my web store, my web store data. And the way I'm going to do that is I'm going to start with my data sources. And I happen to have a couple data sources connected here, Snowflake as well as BigQuery. Of course, I can have a number of different data assets here. And I can blend them together. And you can see I have my different Snowflake databases. I'm going to go to my sample data databases. And then I have all my different schemas that I can be working with here. I'm going to be working with my adventure data. That's my web store. You can see I have a bunch of different tables in my Snowflake warehouse. There's nothing you had to do here. All we had to do is provide the credentials for AdSkill Design Center to Snowflake. And all these assets automatically appear. So there's no extra work to be done. And if I like, I can go ahead and take a quick view of what that sales log table looks like. And I can see I have my data here. I think I'm going to want to use that. So I'm going to start with my sales log data. This is one row for every transaction in my web store. So let's double click that and get into what we call our data wrangling view. So here we have our data. And you can see this is this raw data. You can see I have sales reasons. And you can see I have some nested information here. I have product info that's done as key value pairs. And this could be JSON. It could be XML. You can see I have some dates that are strings. I have some dates that are numbers. This is how data comes in. It's not all nice and pretty. And typically, it's going to require a bunch of manual data engineering and ETL codes to clean this up. But I can do it all virtually because I'm creating a virtual semantic layer. So for example, if I want to clean up my sales reasons field and I want to clean up those nulls and instead replace all my null values with unknown values, I can go ahead and do that and create another clean version of the sales reasons field and then use that in my semantic model. I can also create all new calculations. So I can add new columns. And I'm going to create a new column called sales tax. And in this, I can go ahead and write a formula. In this case, I can actually write a formula in Snowflake SQL. So this is just going to be passed through to Snowflake. That means anything Snowflake can handle when it comes to all its feature set, all its user defined functions, with Snowpark, there's all kinds of great new things and new capabilities that you can enhance on your own data platform. You can take advantage of that. And what you see here is I have now sales tax is now a new column. And it's a new column in my data set called sales log. There it is right there, sales tax. Now I can always, since this is a virtual column, I can always change that amount to change that definition. And anybody using this model are going to get the new value of sales tax. So I first want to make it visible to my users. So I'm going to drag it into my measures pane because I want to make it a metric. And you can see that that at skill cleans it up. It's going to automatically choose how it aggregates. It's going to summit. And I have the ability to put it into a folder. So what do I see here? I now see my first measure called sales tax. That's a virtual column. So let's keep going. I know I'm going to want to look at how many things people have ordered. So I'm going to use order quantity. I know that I'm going to want to see what was my total basket or my sales order amount. And I'm going to want to have that. And now I have my three metrics. This is what the users are going to see. They're going to see a diversity model. And they're going to see these three metrics. So how do we really get this going? Because obviously we need some dimensions. Well, I can come back to my library and filter by my dimension models and pull those onto my canvas for relationships. I'm going to pull on my customer dimension that you saw before, my product dimension, of course, my dates. Everything is about time. And in my dates, you can see I have a date month hierarchy, a date week hierarchy, hierarchy. And I also have a retail 445 hierarchy, which is all about reporting quarters. This is what retailers use to normalize their calendars across different holiday periods. So now how do these relate to my sales log data that I just did a preview of? Well, all I need to do is wire them up. Now, what I mean by that is I mean, I need to create a relationship to the sales log data to my customer dimension model. Now, I just did that. Now watch what happened. Everything came for free. So everything in my dimension model, where I defined the first name and the last name, and I had that model for gender, I had the postal code, and I had those different hierarchies you saw in my geography model. It all comes in for free. That means anybody who's using a customer entity is going to even the subject matter experts are not going to have to reinvent the wheel. They're all going to be using the same customer hierarchy, which means the same definitions and the same way of aggregating it. If I want to do my product hierarchy, I'm going to connect it up to my product dimension. And just like that, you see now there's a new folder and you can see now I have a product hierarchy. Now, what about time? So I obviously have there's a lot of time fields. In this particular set of data, I have an order date and I have a shift date. And those are, of course, two different dates. But all I need to do to use them is to connect them up to my date model. I'm going to take the order key first. And we have what we call role playing. If any of you are dimensional geeks, you'll know what a role playing dimension is, but I'm just going to show you what that's going to do. I put an order and now if you see I have a date hierarchy, now watch what happens. You see there was year, quarter, month and day. Now it says order year, order quarter, order month and order day. So very, very intuitive and not no work to be done, including all my different attributes about time in addition to the hierarchies that I may want to use. And I could do the same thing with shift date just by using instead of order prefix, I'll use a shift prefix. And voila, there's automatically my shift dates. So very easy to build a very nice model without having to reinvent the wheel. So if you remember right, we did have nested data. Today's world, data is not just in rows and columns. It's in nested fields, it's in JSON. And in this case, this data set is no different. You notice I have color, size, weight and style. And they're all being stored in a single field. That's really nice and compact and it's a good way of storing it. But typically for any kind of analytics, you're going to have to break that out. Well, all I need to do here is take that product info field and at scale will recognize it as a delimited field. And I just need to put in the keys of that map column. I'm going to use color and I'm going to use style. At scale, we'll add that to the data set called sales log. You can see now there's colors and styles. And to dimensionalize them, all I need to do is pop them into my dimension panel. And just like that, I now have in my preview, I have color and style. So I have a pretty decent model at this point. And I think I'm ready to sort of make this available for people to start to do analytics on. And to do that, all I need to do is to publish the model. So I'm going to go ahead and publish that model. And what does it mean by publishing it? Well, what you saw there is you saw what I was doing, I'm going to close out of my tableau here. You saw what I was doing as I was actually creating that model visually. Behind the scenes, we were creating a document. And it was just an XML document. So I could have written code for this semantic layer instead of using the visual builder. But of course, it was much easier to use the visual builder. But behind the scenes, all I did was build an XML document. So this semantic layer is all being driven and can be driven by code in addition to using our UI. So now I have my data versity model. And I'm going to connect to it. And I'm going to connect to it, first of all, using Tableau. So we're going to connect live using Tableau to that model, data versity that we just created. And remember, this was a brand new model, data versity, that data versity model did not exist before today or before this session. So what I'm also going to show you is how the semantic layer is going to learn and optimize queries as we go. So first things first, though, here we go. There's my data versity model. And you can see there's all my folders that I just added with all my different attributes that you saw that you saw me create in the model. So what you see is what you get. I'm not dealing, I'm not dealing with any kind of modeling. So if I go to the data source tab, if you're familiar with Tableau, there is no modeling to be done. All this is already done. There's no joins. Then there's no way for people to get the wrong answer to create the wrong joins or to even have to connect to Snowflake. I'm not connected to Snowflake here. I'm connected to the semantic layer. And I can log in using my Active Directory credentials rather than logging straight in to Snowflake. So let's start doing some queries. I want to look at my order quantity. And I did. And remember we had our product hierarchy there? There it is. Product rolls up into name, category, and line. I'll go at the leaf level and look at my product by my products order quantity by product name. So it looks like I'm dealing with a bike store here. I can also look at member color was that key value pairs. Well, there's my distribution of products by color. And remember we had our data attributes. Remember we had our order dates and we had our ship dates. So I can start with my orders by year. There we go. I'll go ahead and focus in and zero in on 2008. And look, because I have a hierarchy, I can just drill down to quarter from quarter to month. And it's that easy. It is truly multi-dimensional, but I'm connected live here. So let's see what the semantic layer looks like in a Windows environment. And then what I'm going to do is I'm going to show you what's happening behind the scenes here. So let's go and let's get, copy my connection string here. And I'm going to start with Excel. So in Excel here, I'm going to connect to Excel. And in at scale semantic layer, we're actually going to connect using the built-in analysis services drivers that come with Excel. So there's no client-side installs to access the at scale semantic layer. And that's really important because this means that anybody with Excel on their desktop doesn't need anything else other than Excel and a URL to get back to the semantic layer, the at scale semantic layer. What you see here is running in Docker. You also will notice I'm logging in using Windows authentication. And here's what it looks like in Excel. You can see there's my data-versity model, which I'll go ahead and select and I'll create a pivot report with it. And here you go. There's my data-versity semantic layer. And you can see there's my sales metrics, there's my order quantity, and there's my color. And just like that, I'm not dealing with a data import here. These are live queries being run against live data. So no pre-aggregation necessary. So let's see what happened behind the scenes here. I'm going to come back to Design Center here. And I'm going to show you what's happened behind the scenes. I'm going to show you what happened using my query log. So the Design Center environment is also an admin utility. And you can see that I just ran a bunch of queries here, a handful. And you can see that there's the model named called Data-versity. And what you can notice in this column called optimization, you can see there's NAs. And then all of a sudden, you can see ads and then cache and ads. And you can see that our query time is dropping as we go deeper and deeper. So what happened here? Let's look at our first Data-versity query. Our first Data-versity query was running against order quantity. And you can see that there. And then what the AtScale platform did is it actually rewrote that query in Snowflake SQL and ran it against Snowflake. And in this case, if you notice, that we actually referenced the sales log table that you saw in my model. But watch what happened as it got smarter. That last query we ran was actually in Excel. And Excel, you can see that inbound query was MDX, not SQL. So Tableau generated SQL and Excel generated MDX. And then what you can see here is that AtScale rewrote the query, but it didn't use sales log. It used an aggregate table. So what AtScale did and what the semantic layer did is that by examining those query patterns, it automatically during the session when I was clicking around actually created an aggregate table and use that instead of my base sales log table. So if my sales log table is 100 billion rows, we're not going to query 100 billion rows. Instead, we're going to query eight rows in an aggregate table. And that's how we leave the data where it laid in Snowflake in this example, but we can deliver instantaneous queries. So let's go back to Excel here. You can see this is a pivot table. Let's say that I wanted to create a model on this data. In Excel. And so I wanted to actually model some data where I wanted to add, I want to add in my, oh sorry, I'm going to convert these. So I'm going to convert these to formulas. And then I'll do this. So I'm converting them to formulas. This is just an Excel function. So I got rid of my pivot table and you can see now these are now formulas. And they're actually coordinates. So I can take, for example, my blue coordinate and my silver coordinate. This is my order quantity by blue and silver. I'll go ahead and add those together. I don't know why I'd want to do this, but I am to add them together. So blue and silver are 11,000 orders. Now watch what happens if I go and I just refresh my data. What happens is that that makes a query back to the semantic layer. It instantaneous is going to refresh this whole workbook and every cell contained that points back to that semantic model are going to be automatically refreshed. And let's go back to our query log here and see just what happened behind the scenes. Where there you can see it, that last query took 164 milliseconds to take that Excel query and turn it into a snowflake query. 164 milliseconds. So that is truly speed of thought. Now what's it look like in a tool, for example, like Power BI? Well, Power BI, and as I load this up, Power BI works the same way. I'm going to get a live connection to my data. Again, no reason to create a data extract or to use Power BI premium and have to import data into Power BI. I'm going to use Power BI with a live connection. And I'm going to connect straight to my data using it. So as Power BI loads up here, I'm also going to show you what it looks like for a data scientist. So here's what a data scientist would see. They would have a Jupyter notebook. They would add the at scale Python library. And they would connect to the model just like you see here with all the different credentials to the particular model. Once they have a model object, they can get all their categorical features, do numerical features, and they can actually run queries. And they can also write back to the semantic model. What that means is they can generate predictions, write them back to the semantic layer automatically, which makes those new fields available for everybody to consume them. So let's go and let's go back to Power BI now that it's loaded here. And let's go and log into Power BI and see what that experience looks like. Again, I'm going to use the analysis of services connector. So again, no new drivers to install here. Anybody with Power BI, just like anybody with Excel, can now use this semantic layer to do their analysis with no data modeling required. So I'm going to use that semantic layer, Earl. I'm going to connect live. Again, no need to import. And I'm going to see my models. There's my projects. And there's my data versity model that you just saw me create. And I'm going to connect live to it. So anybody now is going to be able to deal with this analysis ready data without any extra work. So there you see my data versity. There's my metrics that you saw there. There's my order of quantity that you saw. Remember, we had color. So we had that query by color. So if I go back to my tableau here, I think that was this tab. You can see I have my order of quantity by color. And guess what? You get the same exact view in Power BI that you get in tableau. So same answer, same metrics, without any modeling or anything for your users to do. And that model, by the way, to speak of the model, if I go to the modeling view in Power BI, I can see everything that the subject matter expert did in Design Center because Power BI will inherit the at scale model. So there's no extra modeling to be done for your users. So that means literally anyone can get access to data. Anybody with a spreadsheet on their desktop can get access to data and be a data driven decision maker without having to know where the data is, without having to make it fast, or worrying about modeling it. That's all done by the subject matter expert. So that just takes us home. So let's just come back and summarize and then we'll open it up for any kind of questions that we might have. So what does the semantic layer deliver from a value perspective? Well, you can see that for the consumers, we've really simplified their access to data. They don't need to know where it was. They don't need to understand. They don't need to be a data engineer or have to write SQL or, for God's sakes, write MDX or DAX. That was all done by the semantic layer. And they were able to do true cell service with governance. So that same semantic model is making sure that only people get to see the data that they're supposed to see. So all those controls are done at the semantic layer at the query level to give you the most control over who gets access to what. You could see that we also were able to, you could see that by making queries fast, we actually reduced the amount of work for Snowflake to do in this case. For a lot of our customers, AtSkill pays for itself in the first year because queries are faster because they access less data because they're rewritten for accessing those aggregates. Now there's just aggregate tables, by the way. They're tables in Snowflake in this example, which means you're going to be spending a lot less money on your cloud compute costs. And you could see that, you know, that we provided access to both data science teams as well as business teams. So now they're all using and speaking the same language. And we provided a consistent interface, regardless of the tool that you used. So to bring it home, if you want to come and see just how a semantic layer with AtSkill can improve query performance, we have some benchmark reports. These are benchmarks we did on TPCDS for 10 terabytes. So these are all standard industry benchmarks with very big data. We're talking about 60 billion rows of data for the 10 terabyte benchmark. And you can see just how much faster we can deliver queries across these different data platforms, as well as how much money we can save you for doing data access. So that's it for today's presentation. I want to thank you all for listening to me drone on. And in Shannon, I'll hand it back to you for any kind of questions that the crew might have. Dave, thank you so much for this great presentation. Questions are pouring in before you even started, which is fantastic. I love it. And just to answer the most commonly asked questions, just a reminder, I will send a follow up email by end of day Thursday for this webinar with links to the slides, links to the recording and anything else requested throughout. And if you have questions for Dave, feel free to put them in the Q&A portion of your screen. There's a lot of questions here, Dave, about what products, what databases you connect to, what BI tools you connect to. Is there a single source or resource for at scale to point to? Yeah, so you can go to atscale.com and resources. And you can see the different platforms that we support. But we support all the major cloud data warehouses, all the data lakes through tools like Spark and Presto. We support all the anything that can speak ODBC or JDBC on the inbound side, or you saw XMLA through SOAP. That's where MDX and DAX came from, from those Microsoft stack products. We also support REST for you application developers. So really, we talk to anything with JDBC, and you can talk to us with anything like ODBC, JDBC, or XMLA through SOAP or REST or Python. So that gives you a flavor. And you can see the details on our website. I love it. That's great question, great information. So where does the AI data get written to physically when it gets written back to the semantic layer? That's a great question. And that's a whole other demo for another time. I really would like to, so if you really would like to see how this works live, we'll give you a live demo. But the way it works is there's two calls in the Jupyter Notebook. So the data gets written back to your cloud data warehouse. So in my case, if I was predicting future sales, so I had sales amount, that was historical sales. If I was predicting future sales, I would use a tool like H2O or Data Robot, and I would predict those sales using a regression model. That sales amount would then get written back to two places. First of all, the metadata would get written back to that model you saw automatically through that at-scale Python API. And so that relationship would automatically get related in the model so that now predicted sales would show up right next to historical sales. And then the data itself would be written back to Snowflake as a table. And so what we do is we just knit it all together. So the data scientist doesn't have to worry about all that. They just go ahead and make that call in their Jupyter Notebook. And then we'll handle the write back to the data warehouse as well as the write back to the model. And it will instantly be available to anybody who has access to that model that's been published. Awesome. So what security models are supported? And how does credential pass through the semantic layer to work, support, enable, RBAC and ABAC? Great question. So first of all, we integrate with your directory services. So Active Directory, Okta, and all the popular ones. And the way it works is that if you can see right here on this chart, I actually logged in with Active Directory if you saw it through Excel. And I was logging in as David, me. So I'm logged into the at-scale server as David. And then in the at-scale server, you have the choice of using user delegation, or you can use a proxy connection to your actual data platform. So in the case, most of the time, most customers want to run those queries as the user themselves, because they may have additional security that's applied down at this level. In this case, you could run the query as David against the data platform, or you could run it through a proxy account. Most customers use what we call true delegation to delegate those queries using the proper user. So you can have an audit trail of who asked what. But it's up to you and it's up to you how you configure your at-scale enterprise platform for authentication. But we definitely want to authenticate with your directory service. So you don't have to give users credentials to your data warehouses, like your snowflakes, like your synapses, like your teradata, like your oracles, or like S3. So, Dave, does the semantic layer tool integrate with enterprise data architecture tools like Irwin or ER Studio, where people maintain business metadata and enterprise data models? For example, customer, prospect, marketing lead, et cetera. Would there be a different business definition in the enterprise data model? That's a great question. Today, we integrate our semantic layer with the data catalogs. And so the popular ones are our Calibra and why am I elation? Calibra and elation. And we have an open API that we use for those integrations. So you can integrate with whatever tool you have. If you have a data catalog, for example, from Informatica, you can use that. Or we also have the ability to import models. So because those models are just XML documents, if I were to go, you saw me, if I go to my home screen here, you can see how we can do quick starts for creating these models. I created one from scratch. That's what you saw me do today. But I can also import models using an XML definition or actually import them from different tools like Tableau. Because guess what? Your models, whenever your users create a workbook in Tableau against a data platform like Snowflake, they're creating a model. And so we can inherit and import those models, as well as importing cube and cube models from SQL Server Analysis Services. So there's a number of different inputs you can use to create that model that we created today, other than doing it in Design Center. I love it. And by the way, I do love the fact that you created a diversity model. I appreciate it. So are you supporting Immuda and Privacera for integration, for securitization, metadata into semantic layers, as well as extracting such from at scale for integration to governance tools? Yeah, that's a good, that's a great question. Now, if you look over here, and this is one of the definitions for, this is the definitions of what we think a semantic layer should have. And governance, so Immuda and Privacera, those are sort of like, those are separate governance tools. We actually think that's part of the semantic layer and should be integrated with the semantic layer. So we perform the same sort of capability. We introduce and supply the same kind of capabilities when it comes to data filtering, when it comes to data masking, data hiding and showing, all based on users and groups and RBAC integrated with your directory service. So that is really part of the platform you get. So while theoretically you could integrate with a Privacera or Immuda, most likely you would, you want to use a single semantic layer to do the governance in addition to the semantic layer modeling. Awesome. So, so many integration questions here. I love it. So what compute engine is used in at scale if pushdown doesn't cover the entire surface area of the use case? Ooh, good question. So in the case of, if I'm dealing with a model with a single data source, in this case Snowflake, it's always going to be pushed down. We never bring any data back to at scale servers. Now in the case of a blended model, so let's say I did a model where I'm blending data from Snowflake and from S3, for example, or Oracle. In that case, if we have to do a federated query, we'll be using Spark underneath the surface and that's our Spark. It's not your Spark. That's all part of our platform. We would use Spark to do the federated query where you create aggregates in the different remote locations to make it fast. So we're not moving a bunch of data. And then eventually at scale would actually create an aggregate in a single location that would have the composite data from those different sources so that the data would get, you would see at scale with federated queries get fast, faster, and then basically native to your core platform where you're storing your aggregates. And there was a question about, you know, is at scale caching the data from the queries? There is a memory cache and so I'm going to come back to this view here. You see where it says cache in the case of, especially with queries like Excel where we can, some of those aggregates were actually can be cached on the at scale servers. And that cache, of course, will be refreshed whenever your data changes. And the data, you know, we know data changes is when you call an API or you drop a file, a trigger file, or you schedule those aggregates to be updated. And then those caches will be refreshed. So yeah, if we can cache it, meaning we can cache an aggregate locally, we'll do that and we'll make it so that you won't have to make a query to the data platform at all. But you know that memory is not enough by itself. So the way it goes is that we're first going to try to use an in-memory cache to satisfy a query. If that doesn't exist, we're going to use an aggregate table and send it against the data platform. And if that doesn't exist, we're going to go back to the source atomic data and access it that way. So that's sort of the hierarchy of how we deal with performance. We're always going to be making it better and improving it and learning over time. And you saw it happen right here in our Dataversity model. Awesome. Well, I think we've got, well, we definitely have time for a couple more questions here. It looks like things mentioned can be done in virtualization to like Denoto, correct? Or what are the differences? Great question. So yeah, part of this, part of the calculation engine and part of what we do in the model is virtualization. So virtualization is definitely, it's how we're able to not move data. But here's the big differences between standard virtualization tools like Denoto. First of all, you saw me use no client-side drivers. So there's no at-scale drivers. That's really, really important because if you go to your IT shops and say you want to install drivers or plugins to Excel for every platform, they're probably not going to be very happy with you. So that's one big difference, is that there are no drivers and no client-side installs. The other difference on the back end is that we are dialect aware. We're not doing a least common denominator approach. If we can take advantage of partitioning and of doing optimizations and using optimizations for a particular data warehouse dialect or database dialect, we're going to do it. And it's all pushed down. So there is no data brought back to the at-scale servers only in the case of a federated query. The other part is the model. And the model is multi-dimensional. It's not just creating database views. And that's how we're also different from standard data virtualization, is that you saw how easy it was for me to drill down from year to quarter to month. That's a hierarchy. That's multi-dimensions. You saw me have the hierarchy for products. That's multiple dimensions. You saw me do time intelligence. That is multi-dimensions. So we do dimensionality. It's a dimensional model. It's not a flat view-based or tabular model. The other major difference is performance optimization. We don't have a separate cluster for managing aggregates. We don't put it in a Spark cluster. We don't put it in a Aero cluster. There's no extra infrastructure for you to manage. We do that by creating aggregates on the core platform and rewriting our queries to access those aggregates. And that is unique. And that's very different than standard data virtualization. So to summarize, no client-side drivers, no at-scale branded drivers, a multi-dimensional model, not a tabular model, and performance optimization that is all done in the data platform, not outside, and dialect the where push down for queries. Scale run in the Azure cloud and does it run within customer's tenant? Yes. So you can choose how you want to install at scale. You can have an RPM install. We're also going to be announcing a SaaS offer coming out very, very shortly. But in the case of the RPM install, you install it in your own VPC wherever you want. If you want to install it on-prem because you have on-prem tools like Oracle and Teradata and SQL Server, you can do that. You can install it in your cloud infrastructure. If you'd like to install it there, it's all up to you. Runs on all the different clouds and all the different cloud data warehouses. So if you have Azure Synapse and we run on Azure Synapse, you've got Databricks. We love that. We love Redshift. We love Google BigQuery. We love Snowflake. So we'll run on all those platforms without any modification, without any gotchas. I love it. Again, it's great. And do you specifically integrate with Azure Purview? Again, through same story for our work with the data catalog that you saw here, we have open APIs. So yes, we could integrate with Purview. It would be up to you, the customer, to do that integration through our APIs. So we don't have a packaged offer there. But we hope to have one soon. But you could do it today with our APIs. Dave, I'm going to slip in another question here. Can you import Power BI tabular modules? Oh, good question. Not today. Not today. But we can work with you. Our importer is a, the importer code is something that we share with customers. And so you can write your own importers if you're coming from other platforms we don't yet support. But we don't out of the box have an importer for Power BI tabular. Fantastic. So there's one quick follow up to a previous question. I gather the shape of a result set does not change based on ABAC restrictions? The shape does not change. So let me see if I understand that. So I could tell you how, if we do row level security, and let's say we have row level security and we have a model that spans multiple tables. And let's say that some of the tables have different security access rules where you can see some columns and not others. We have a patented technology where we will understand those relationships. And so if you can't access or you don't have access to a full set of data, then we will tell you that we will return an error and say you don't have access to that data. So it's still all done dimensionally in the model. Row level security will happen as a where clause. Column level security will happen as a what we call a view. And so you can decide who gets to see what based on which roles. And if you're doing security with multiple tables, with different rules on those tables, we're going to run what we call a query ahead of time to check to see whether that composite view is accessible. And if it's not, we're not going to allow the user to get access to that data. We call it a canary query. And the questioner specified in your columns are not quote unquote cut out is what I meant ABAC. Yes. So the columns would not be cut out because you would not see them. So if you as a user couldn't see the revenue columns in your semantic layer in your tool, whatever tool you're using, you wouldn't see those revenue columns. So you even know to ask for them. So you couldn't ask for them. So the semantic layer would appear differently to different user groups based on their roles and security access. But it still is the same model. It's just going to appear differently to them. We call that perspectives. Love it was so many great questions. And this has been such a great presentation. But I'm afraid that is all the time we have for today, Dave. Thank you so much for this. And thanks to at scale for sponsoring today's webinar. Thanks to our community for being so engaged in everything we do. Again, just a reminder, I will send a follow up email by end of day Thursday with links to the slides, links to the recording. And I'll grab that link, the resource link to get all these integrations out to you all. So you can see the full list there. So thanks, everybody. Hope you all have a great day. Thanks, Dave. Thanks, Shannon. Thanks, everybody.