 and here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of DataVersity. We'd like to thank you for joining this DataVersity webinar, How to Use a Semantic Layer to Deliver Actionable Insights at Scale, sponsored today by AtScale. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A, or if you'd like to tweet, we encourage you to share highlights or questions by Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to know Zoom defaults the chat to send just the panelists, but you may absolutely change it to network with everyone. And to find the Q&A or the chat panels, you may click those icons found in the bottom middle of your screen. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to our speaker for today Dave Mariani. Dave is the founder of AtScale and is the Chief Strategy Officer. Prior to AtScale, he was VP of Engineering at Cloud and at Yahoo, where he built the world's largest multi-dimensional queue for BI on Hadoop. Mariani is a big data visionary and serial entrepreneur. And with that, I'll give the floor to Dave to get today's webinar started. Hello and welcome. Thanks, Shannon. Thanks for having me. And thanks for everybody for attending the session. Today, we're going to have a little bit of fun, everybody. So we're not just going to do slides. I'm going to show you and take you through a live demo and so you can see what a semantic layer looks like in action. And we'll also see what life would look like without a semantic layer. So it should be fun. So before we get to the demo, I thought we just defined the terms first. If you were to do a Google search for semantic layer, you're going to find this pretty good Wikipedia definition. And I really like it. I think it's as good of a job as any to really describe what a semantic layer really means. And the emphasis involved is Wikipedia's or Google's not mine, but they do a good job there, too. So what is a semantic layer? Well, it's a business representation of data. So that's key. It's business representation. So it's not a technical representation. It's business friendly. And the other thing that's key here is that it allows end users, end users meaning anyone, not just SQL developers or data engineers, to access data, another important word, autonomously. And that's really key, meaning that they can self-serve. And they can self-serve because they're dealing with that business representation, because they're dealing with common business terms that they understand. So business terms like customer and product and revenue and gross margin, all things people understand. And so semantic layers have been around forever. I'm not here to tell you that there's any big new invention here, but the big really new thing that we're seeing today is that the semantic layer, rather than being tightly coupled to the consumption tool or even the data platform in the form of store procedures or reporting views, the semantic layer now lives separate and distinct from the consumption layer and from the data platforms. Now, why is that advantageous? Well, it's advantageous because before you had competing semantic layers, because if you had more than one consumption style, which if you have users who are using Excel, you automatically do, they're going to be speaking different languages out of the box. And then it also makes it really tough to incorporate new data platforms. So for those of you who are looking to migrate from on-prem to the cloud, you know that how hard that was or is because your consumption tools are tightly coupled to your data platforms. So what does a semantic layer do? It creates that layer of abstraction. And by doing so, allows you to have consistency. So gross margins, gross margin, regardless of whether you're a data scientist who are building models, whether you're a business analyst who are using Power BI, gross margins always going to mean the same thing and always have the same definition and the same value. And it also gives you a single point of control. So there are separate tools out there for doing governance. My position is that governance is a core feature of the semantic layer. And that should be all tied up in there that you should be able to apply rules against that business view of that logical view of your data in the semantic layer, because every single query is going to cross that threshold of that semantic layer. So it does give you that consistency and control. And the agility comes in the form of the fact that you're now able to create models. So a semantic layer is powered by a data model. And that model now can make more people, more people give them more of the ability to actually go and actually get access to the data, because they don't need to understand where the data is or how it got there. So this is where a semantic layer really fits in this analytic stack. And I like to think of as this as sort of a representation of what the new analytic stack looks like. So what you got here in the bottom, of course, if you're in the cloud, the modern analytic stack says data lands in the cloud. And it also may be stored away and locked away in SaaS applications, like Salesforce or ServiceNow. In any case, the data and your data is typically in the cloud in a modern data stack. You might have some data transformation tools like a DBT or Matillion or something like that that may be used to load data and clean it up and put it into a cloud data warehouse, like a snowflake or a BigQuery or a Redshift, for example. You may choose to use a Lakehouse architecture, in which case you're using Spark or a tool or a platform like Databricks to access the data lake directly. Or you may be, of course, wanting to access data in those SaaS applications, but you want to mash them up. So you want to combine data from Salesforce with your first-party data you're collecting on your customers, for example. Well, that semantic layer is the place where that integration can happen. Because now using a logical data model, you can now stitch that data together and present a common view of that data to your consumers up on top. So whether they're business analysts using tools like Tableau or Power BI or Looker or even Excel, which is the most popular BI tool on the planet, they have a live access to that logical layer and they don't have to understand where the data is stored or how to write queries against that raw data. That's all taken care of for them in that semantic layer. Your data scientists get to take advantage of the same ease of use as well, so they can spend less time on data wrangling and more time on creating those machine learning models. But they also can use the semantic layer to write back their predictions. So now the semantic layer becomes the unifying thread that joins your data science team with your business analyst team. And of course, you have your application developers who are building analytics into their applications who also get to take advantage of that semantic layer. Now the data catalog you see over here on the right runs along the side and it's really important that that semantic layer integrate well with that data catalog so that these semantic models are discoverable in the enterprise. So now we have sort of a feel for sort of where the semantic layer fits. Let's look and see how it looks like in real life. So what you're seeing here and I'm going to give you a demo, so I'm not going to spend too much time on this slide, is you see that like I mentioned the semantic layer is powered by a semantic model. And what you see here is you see at scales visual designer, we call it design center. This is what the data steward uses to create the model. So if you think about implementing a semantic layer, there's really two core roles in your organization. There's the data steward who is authoring these models and then there's the consumer who's going to consume the models. And the reason why we have those two roles is that the data steward understands what they how the data stored understands probably the schema of that data. They also understand the business logic and the definitions where the consumers don't have to worry about that they can just consume that logical layer that the data steward has prepared for them. The other thing that is important about this model is that data stewardship, you're going to hear that a lot, especially when people are talking about data mesh or hub and spoke analytics styles. So a semantic layer is really ideal for a data mesh architecture, because now your data stewards can create and share these models and they can plug these models together to create logical representations of a business process. So for example, the finance team can create their finance model and they can then join that with the marketing teams, marketing model. And now you can see what the cost is of that new customer you acquired, because now you have the cost of acquiring that customers and the channels for which those customers have been acquired. And you can do that without having the finance people having to understand marketing and marketing having to understand finance. So it really is a great way of creating a data mesh or distributed analytics style of organization for creating data products. So once you have that model, you can see over here, this is at scales representation, you can see that this is like what we've defined, the data steward has defined. And you can see a data model is made up of dimensions and measures. There's all kinds of organizations that organization properties like folders, for example, and then you'll see later on that there's also hierarchies, because you can't have a semantic layer without dimensions and hierarchies, you need to know how to roll up your products or you need to know how to roll up your time and your calendars. And you can see over here that how that semantic layer might look like in different tools. And these are all BI tools, but you can also see the semantic layer in Python, in your Jupyter notebook, for example, or in your AutoML platform, like a data robot. So what you see is what you get. Everybody is dealing with the same exact attributes, same definitions, and they're not having to model that data on their own. And that's where you get that agility or that fast time to insight. So then what is in how can a semantic layer be used in terms of use cases? Well, a lot of people, like I mentioned at the top, are wanting to move from on premise into the cloud. So a semantic layer can be really useful because as you move to the cloud, you're going to find out that you're going to incur a lot of costs when it comes to consumption. The great thing about the cloud data platforms is that now everybody can get access to the data. Now, the downside is that everybody has access to the data. So you're going to drive a lot of consumption and a lot of repetitive queries. And the semantic layer can really optimize that their cloud resources save you a ton of money and also make it really fast without having to build physical cubes or take data out of the data platform in your cloud and put it into some analytic store. It's also, you might hear about metric stores. So a semantic layer is the fundamental underpinning for a metric store. And that's where you can use it for doing governed enterprise metrics for your analysts and your data scientists. You can use it to bridge the gap between your AI teams and your BI teams because your data scientists can now write and create new attributes and write them back to the semantic layer. They can share those with the rest of the analytics community. So for example, we always have, you know, when you look at sort of the analytics maturity scale, you know, we start with doing descriptive analytics, looking at historical analysis. For example, revenue last month would be an example. And then we may be doing some diagnostic analysis. So why did revenue change from this month versus last month? But when we get into the predictive and prescriptive analysis, that typically is the realm of data scientists. So with the semantic layer, the data scientists can now predict future revenue. And so now the data scientists and the business analysts can see historical revenue alongside of predicted revenue, all within one platform within Excel, their BI tools of choice, within their Jupyter notebooks. So it really is a great way to stitch together those two teams. And then finally, you know, OLAP is there was a reason why OLAP was so popular because it's fast and it's very easy to use. And a semantic layer can serve that the role of an OLAP tool, or OLAP engine, without the baggage of having to physically build cubes. So in the demo, I'm going to give you right now, I'm going to show you how we're going to create a new semantic layer from scratch. And I'm going to play two roles in this demo today. I'm going to play the role of a data steward by creating the model. And then I'm going to play the role of a business analyst. And I'm going to show you how the model looks and behaves in a bunch of different tools. I'm going to show you Power BI and Excel and Tableau. Maybe I'll even show you what it looks like to an application developer. And also show you what it looks like to not have a semantic layer. But what you can see in this picture, that the semantic layer in this case, I'm going to show you at scale semantic layer, needs to be able to speak and these different protocols. Because these tools need to talk live. So we're not talking about the semantic layer as an import mechanism. That would defeat the purpose. The only way a semantic layer can work is if it's live and live access to the data. That means that doing the data steward changes the definition. It's instantly reflected in every business tool and every report, regardless of how it was created. You can also see that we're talking directly live to the data warehouse platform. And that could also just be a data lake house. And so we have our data tables. And then you can see this thing over here called aggregates. So a semantic layer is useless if it's not as fast or faster than the native data platforms queries. Otherwise, you're just going to go ahead and import data into power BI with import mode, or you're going to create hyper extracts using a tool like Tableau. And that again defeats the whole purpose of semantic layer. Because now you created a snapshot of your data. And it's going to diverge and be different across users and across different tools. What the semantic layer does with the at sales semantic layer does is it automatically creates aggregate tables and rewrites queries to use those aggregate tables in place of the raw data. And that's how the semantic layer can deliver sub second performance, even against a data warehouse that could be potentially hundreds of billions or even trillions of rows. Okay, you're probably saying you're crazy. I don't believe you. So let's let's go and let's look at it. So this is what you saw in that picture before. This is design center. And what you see over here on the right is you saw that preview panel I showed you in that slide. And you can see that I have my different hierarchies and different attributes about my customers. I have the information about my orders and order line rolling up into orders. And I got my dates. I have looks like I have order dates. I also have shipping dates in here. And lots of different ways of cutting it. And then I have my metrics over here. I can have a bunch of different things where I can cut my data. So this happens to be a data model that's based on a web store. And what you're seeing over here is you see these these objects here. Well, this is the snowflake data. So I'm doing a demo today on live snowflake data. And then you can see these orange lines. These are what we call relationships. This is how we connect together data to different models. Remember I talked about being a data steward and being able to connect models together. I'm going to show you that in the form of these conformed dimensions which you see here in green. So let's just start from scratch. I'm going to build this model for you live. And I'm going to do it right now from scratch. And then I'm going to publish it and then make it available. So let's see. I'm going to call it Dataversity Web Store. So that's what I'm going to call my model. And I'm going to go ahead and enter my canvas. So this is what we call the canvas. Now right now you can see I don't have any measures and dimensions just yet because I'm going to go and map my data into this data model. You can see I have different connections here. I'm dealing with Snowflake today. So I'm going to see. Notice I opened up the Snowflake connection. You can see I have different data here. I'm going to use my sample data over here. And I'm going to use this AS Adventure schema. You can see I have a bunch of tables here. And I'm going to use my sales log because that is a table that has one row per each item purchased. So you can notice that I just dragged that object onto my canvas. That is a Snowflake table I just dragged onto the canvas. I didn't have to describe this. This is all done live through JDBC. So as soon as the table is in Snowflake, it's available for this data steward, me in this case, to actually wrangle. Now if I double click on that sales log, you can see I have a preview of my data. And you also notice that there's different ellipses here. And these ellipses are just different functions that allow me to do cleanup, do some data wrangling. So for example, if I want to get rid of my nulls and replace them with unknowns, I can do that. And you can see now I have a new column called Sales Reason. I can call it Sales Reason Clean, for example. And now I have a clean version of that column. This is all done virtually. I'm not physically doing anything here. It's all virtual. I can also create new calculations. Like I'll create a sales tax calculation. And I'll create it as, you can see this is I'm just writing a formula in Snowflake here. And this is just a snippet of SQL. In this case, it's Snowflake SQL. We're just going to pass that straight through to Snowflake. And that's the beauty of it all. So there you go. There's my sales tax. You can see there's the preview and the calculation. And if I come back to my main canvas here, and I scroll down, there's sales tax as a calculated column. Now this object, which is called a data set, is now shareable. So I can share it with other data stewards so they can take advantage of that sales tax definition. In this case, I want to use it in my Dataversity Web Store model. So I'm just going to go ahead and drop it in here. And you can see if I do that, I'm going to create a measure. What's a measure? Well, it's just a way of aggregating that column. And you can see I have a lot of different selections for how I aggregate that. At scale in this case is pretty smart in terms of how it's aggregating. I could even decide whether it's a semi additive measure and do special handling for that. Because in at scale's case, and this is different from other semantic layers, it's fully multidimensional. So you can do all kinds of really rich analytics server side without having to build a physical OLAP cube. So now there's my first metric. I picked it and I put it into a folder. I'll show you that a little bit slower here. So let's do sales amount. I want to know how much I purchased. You can see it's cleaned up. You can see I can also put it into a folder. And now I have that one. Let's do order of quantity, level of quantity. Let's also put it in a folder. And now I have my first three metrics in my preview pane. So the preview pane is what the consumer is going to see. It's what the data consumer is going to see. So as a data steward, I'm creating this model to make it easier for these users to access this data without having to understand all the schemas and snowflake. Okay, so remember I mentioned that as a data steward, I can share models and connect them up. I have some new data here in sales log, but I know that I want to be able to connect it up to my calendar and my product hierarchy and my customer hierarchy. Well, I can just come here in the library here. And what I have shared with me is other models around customers. So I have a customer conformed dimension. That's why it's green. I also have a product conformed dimension. And I have, of course, time, because time is so important, a date dimension. Like you see, I have different hierarchies for month, month, and week. And I have a 445 hierarchy, all kinds of great stuff. I can do a custom PP 445, whatever that means. So the data steward created this hierarchy. Now, everybody is going to be talking about time, regardless of what data model they're offering the same way, whether I'm in marketing, or whether I'm in finance, I'm all going to be speaking the same language. So how does that work? Well, we just wire them up. So watch what happens. I got my customer key here. And I got it in my raw data. Now watch what happens. You see how it highlights in green? It means this is a compatible relationship. So I'm going to connect my sales log customer key to my customer hierarchy dimension here. And then watch what happened on my preview. I got a bunch of attributes on my customers, including some PII. And you can notice I got two different ways of rolling up my customer's location at the zip code level or at the city, state, and country level. So how did that all happen? How did this one little connection here bring all that data with me or all that metadata for me? Well, let's just double click on this customer dimension and see what it's made up of. Well, you can see now I'm on a new canvas. And this new canvas has another model embedded in it. So there's a dim customer snowflake table that's also related to two other embedded dimensions, including a geography dimension, which itself is made up of four different tables. So, and you can see over here, there's different hierarchies defined from those tables. So that's what really, that's really encapsulation, right? Because I was able to encapsulate that customer dimension, the data steward doesn't have to worry about reinventing the wheel. Although all those attributes are already done and any kind of security that was defined on this customer attributes model are going to be carried through with this model that I'm creating here. That's the beauty of it. Okay, so product works the same way. I'm going to hook it up and watch what happens. I now have product and you can see the product hierarchy. It all got inherited, including that secondary attribute. Now watch what happens with time. Well, I have a couple of time fields here. I have a ship date and I have an order date. So how am I going to get them both? Well, we have what we call role playing. So watch what happens here. I'm going to connect up order date. You can see I can have the option of defining a prefix. And if I do that, now I have order dates and you can see I have order year, quarter, month and day. You can see it just says year, quarter, month and day here. It actually inherited all the structure as well as all those different attributes. So if I take ship date, for example, the same thing is going to happen if I put ship in here. And now I have my ship dates. Simple as that. So it's not hard to just define these models. It's very quick and easy because a lot of that work has been done by other data stewards in my ecosystem. You know, sometimes data is not just columns. Sometimes it's nested. It could come in JSON. It could come in XML. It could come as key value pairs. I happen to know that the product info field is key value pairs. AtSkill recognizes that and allows me to break out those key value pairs into different virtual columns. So I'm going to say color and style of the key. And now I'm going to go ahead and drop those in to my dimension panel. And like that, I now have dimensions in my model for color and for style. I don't have to create a star schema. I don't have to create dimension tables. All that work is done virtually in this data model. That's the beauty. Now, what about that PII? So remember I had some PII in here? See, I have my customer names, last names. We have the ability to create what we call a perspective. So if I want, I can go ahead and create a new perspective. And I can call it data versity web store. And I'll call it no PII. So I can come to that customer dimension and I can say, you know what? I don't want anybody using this perspective to see my customer names. And I don't want them to see the full name either. So now what happens is now that becomes a new virtual model. And so I can see over here, I now have two. So if I go to the data versity, no PII, and I open that up, you can see that there's no longer those customer names. All I see here is if I go back to the original, you can see I can see the PII. So now I can take and I can assign directory groups and users to those different models so they can only see what they're supposed to see, given their role. And in my case, active directory, which is what I'm using today. Okay, enough said, I'm done as a data steward, not quite done. I'm going to go and I'm going to publish this model. So publishing this model means that I'm going to push what it was essentially an XML definition. So it's just an XML file that we built, which means I can build it programmatically if I like. I did it visually. And now you can see I have three different models. I had my original internet sales model was published. And then you can see those two models I just created for you, including that no PII model. So now what I can do is that now is queryable by my consumers. So now I'm going to give up my role as a data steward. And I'm going to now open up Tableau and connect straight to the semantic layer I just built and show you what the experience looks like as a consumer. So I'm going to log in. I'm going to use my active directory credentials and log into the semantic layer. This is great because now users don't have to connect to, in this case, Snowflake directly. They're just connecting using their active directory credentials straight to the semantic layer. So there you go. There's my data versity web store. And then here's all those attributes and dimensions that you saw me just create. So let's go and do some live queries now. I just double clicked on order quantity. Remember that was the first metric that we added. We actually had sales tax with the first one. And then remember we had our product hierarchy, name, category and line. Let's do it at the name level. You can see now I have my products at the name level. You can see I have order quantity by product and I'm dealing with looks like a bike store. Remember I had my time, my data attributes. Remember I had all those different dates, orders and ship dates and I have different hierarchies. Well, now I can go ahead and I can drill down on these dates. So there's my order year and notice this calendar year. I'm using that and I can just drill down very naturally within Tableau. And I didn't have to model any of this data. It's very user friendly. I understand what it means and I'm just picking and choosing. Now what's happening behind the scenes is that AtScale is intercepting these logical queries. It's rewriting them and translating them into snowflake queries and then running those live snowflake queries before your eyes. Remember the data diversity web store didn't exist until five minutes ago. Remember we had our color. Color was a nested key value pair. Here's all my products by color and you can see a bunch of my products don't have color which is why the null value shows up. So this is what it looks like when you're using a semantic layer. Very easy to use and you can notice if I click on my data source tab in Tableau, I'm not doing an extract. That's cheating. Once you create an extract, you create a separate copy of your data and that data is now out of sync with your warehouse. I'm connecting live which is how I get to see this great data and I see it up to date in the warehouse. Let's go and look at a different experience. Let's go to Power BI. In Power BI, I'm going to connect directly to that snowflake data warehouse you saw. This is what life would look like without a semantic layer. I'm going to use and you can do this with any warehouse. Again, I'm using snowflake but any warehouse really will allow you to connect to it live. Let's just go ahead and just do that. I'm going to look for snowflake and now I'm connecting straight to snowflake instead of connecting to a semantic layer. I'm going to go ahead and connect. My first problem is that I need to know the server. I cheated. I happen to know it. I know that's the server. I also have to choose a warehouse. How are you going to know what a warehouse is? In snowflake, you have to have a warehouse allocated and you don't want just any user attaching a warehouse because it's going to start that warehouse up. I could have a monster warehouse that I'm connecting to right now. I'm connecting to the Mariani warehouse which is not a monster warehouse but you can see where things can go off the rails pretty quickly. Now, as I click there, my credentials were actually stored. If they weren't stored, I would have to log in to snowflake directly. Here you see what you saw really in Design Center for that data steward. You saw those different hierarchies. Here's that sample data. I'll open that up. There's my schemas. Now what I could do. There's all that raw data. What if I just wanted to look at my sales by time? That's a tough one here. I think I know I want to have sales logged. Let's assume that's there. At least I can see a preview. Then what do I do for time? I'll choose a state custom because maybe that's what I want. Then I'm going to have to click the load button here. I'm going to wait for Power BI to do its thing. Watch what it does. It wants me to import the data because direct query, even though it's here and I'm going to choose it, is way too slow for dealing with even medium-sized data sets. In this case, I'm just going to go ahead and try to do connect direct. Then you can see it's importing those two tables. Now what I need to do is I need to model those tables because as you can see here, now I have my data but then here's my dates. I don't even know how to deal with this and how to actually create a query that's going to join properly because I'm going to have to actually go and create a relationship somehow between these two different tables to be able to get a true roll-up by date. What I'm doing here is I'm forcing my Power BI users to understand the physical schema of Snowflake. Problem number one, they're going to have to model the data themselves which is going to end up resulting in different results than my neighbor next to me. Problem number two, I have to have a live direct connection to the Snowflake data warehouse with the data warehouse. Problem number three. I now have to worry about refreshing the data, especially if I'm using a live connection. That's not great, is it? Let's go ahead and I'm going to close out of this. Let me show you a different experience altogether. I'm going to come back here to my Windows environment here and instead I'm going to load up, hold on one sec, getting my Power BI again. Here we go. Let's look at what life would look like with a semantic layer in Power BI. You already saw how nice it was in Tableau. There's not going to be a surprise. It's the same third thing in Power BI. Instead of going get data, I'm going to do a get data like I did. I'm going to use the built-in analysis services connector. There's no client-side connectors required for AtScale, and that's really important because we emulate SSAS or Power BI Premium. So we're speaking DAX. You also notice we're connecting live. The import is not the default button here. Then you can see now we're connected to our AtScale server, and you can see there's our sales insights snowflake. That was the project. There's that data-versity web store. I'm going to go ahead and just select it. Now here I have in connecting live, and so you can see it's a live connection. Then here's my model that you just saw in Tableau, and that I just created in AtScale Design Center. Now I'm running live queries. In this case, these queries are in DAX. So no modeling of data, no connecting the snowflake, no worrying about warehouses, no worrying about security. Look, I got the same answer that I got in Tableau that I got in Power BI because I'm playing by the same rules, and that data, looking at that data modeling tab, there'd be a habit. That model has been inherited from AtScale Design Center. So that's because AtScale is speaking DAX, which is the native language of Power BI, and that's the beauty of it because we're not dealing with having to force SQL down Power BI's throat when Power BI wants to speak DAX, not SQL. Okay, so that was interesting. Let's go and look and see what the world looks like to an Excel user. Well, Excel works pretty much the similar way. I'm going to actually connect to an analysis service connector. Again, we're emulating analysis services here. You can see that I'm logging in with Active Directory, so I'm logging in as David, and then you can see here I have my different projects, and they now appear as different cubes within Excel. So I'm not importing data here. I'm now actually doing a live pivot table, and you can see there's my DataVersity Web Store, and there's my semantic layer, and this is how the semantic layer now looks like in Excel. And there you go. There's my order of quantity by color. Now, this is a pivot table. It doesn't have to be a pivot table. If I like, I can actually convert it to formulas, and I can change now in Excel. I can make these cells point to a cell in the AtScale model. So you can see now this is now a cell value, so I don't have to use pivot tables. I can just go ahead and use these as cells pointing directly to Snowflake and incorporate and build my own financial models in Excel and know that these cells are all being kept up to date. So if I want to go ahead and refresh, I can go ahead and do a refresh all. It's going to go ahead and fetch that data from Snowflake, and voila, my data is up to date. So now I have a live spreadsheet, not a static spreadsheet, not a dump spreadsheet. It's all being governed the same way as it was governed in Empower BI, as well as in Tableau. Okay. So what was happening behind the scenes? So I'm going to pull back the curtain so you could see what was going on here. And I'm going to do that by coming back to Design Center, and I'm going to go to my query screen here. This is a log of queries. So this now is the life of the DataVersity Web Store virtual model. So you can see here, this is when the life began for this. And you can see that in this column here, each one of these represents a query. And you can see that on optimization, you can see that these tags called ads or cache and ads. What that means is that let's look at the very first query we ran. This was the inbound query. See, this is inbound. And the inbound query is the query we intercepted from Tableau. So Tableau sent us this query, which is a logical SQL query. And then at scale turned around and ran this query against Snowflake to fetch the data. Now, the second time we asked a question, you can see that it already says X. Why is that? Well, here's again a Tableau query. And this is by time. And then you can see that this case, at scale substituted for sales log this aggregate table. So why? At scale got busy and created this aggregate table automatically and rewrote queries to use it. So by doing that, you can see up here in our last queries, we were running Excel queries. Well, that is now MDX that's inbound. And you can see that at scale wrote queries against Snowflake using those ads. And you can see if I look at the query time, that was 133 milliseconds. You can see how fast these queries are. You can't get that from Snowflake alone. That is because the semantic layer is being smart and it's creating aggregate tables in Snowflake. Data does not leave Snowflake. And rewriting queries to use those aggregate tables and keeping it up to date with your data. You can also see that the user is David, because that is my Active Directory username. So I now know exactly who's doing what and who's running these queries. So that's all great. What if I want to make a change now? So let's go back to my project screen. And let's go back to my project that I just had there. And let's go back to my web store. And let's go and let's say that, you know what, I don't want just color and style. I know that I also want to do a report on my sizes. And to see what sizes I have in inventory. So I'm going to go ahead and do that. And I'm going to go size and then drop it in there. And now to make that available and make that new attribute available, I just need to go and republish that model. So by republishing it, now that new attribute size is now going to be part of that semantic layer. So let me show you what it looks like. Okay, there's my colors. You can see I have my attributes here. I got color and style, right? Well, let's just go and force a refresh here on my web store. I'm just going to force Tableau to go and relook at this semantic model. And then there you go, there's size. So just like that, I'll get rid of all my nulls. I now have a visualization that shows me color and size, sorry, together, all in one visualization all instantly. Now, if you're like me at Yahoo, I built a analysis services cube that was 24 terabytes large. For me to get that new attribute size in there, it would take me about three days to actually modify and test the model. And it would take me seven days to actually build the cube. So that's obviously quicker than going through the process of a traditional OLAP cube. And the semantic layer is freed from the confines of a physical cube and works on modern data platforms like you see here on whether it be Snowflake or Databricks or Google BigQuery or Redshift, you name it, or Azure Synapse or SQL Server, we got you covered. So I hope that gives you a good feel for how this all works. And just to prove it to you that you probably said, oh boy, I'm sure you are working with small data. I encourage you to come to atscale.com, go to the resources and look for our TPCDS benchmarks. So we run benchmarks with and without at scale. And we do it on 10 terabytes of data. Now, the 10 terabyte TPCDS dataset is about almost 60 billion rows of fact data. So it's big data, truly big data. And the dimension table, the largest dimension table being customer is 65 million rows. So we're talking about joining 65 million rows to over almost 60 billion rows. And so these are the different speed ups and cost savings compared to running at scale versus the data platform alone. So you can see that we can really improve your cost. Snowflake is up to 16 times cheaper. You can pay for the semantic layer in under a year in distance savings of your cloud cost. And you can see that one under full utilization of user concurrency. So this is running one, five, 25, and 50 concurrent users for those TPCDS queries at scale can deliver up to 31 times faster query performance. So that's really gives you, I hope, a really good fly by of what a semantic layer can do, what it looks like, and how it works in practice. So with that, I'm going to hand it back to you, Shannon, to see if the team has any questions for me. So Shannon, back to you. Oh, apologies, my mic was muted. Thank you, Dave, for this great presentation, as always. And if you have questions for Dave, feel free to submit them in the Q&A portion of the screen. And just to answer the most commonly asked questions, just a reminder, I will send a follow up email to all registrants by end of day Thursday with links to the slides and links to the recording of this session. So diving in here, Dave, our semantic layer is quote unquote above and also a way of connecting to data mesh. Ask differently, how semantic layers relate to data mesh? Yeah, you know, part of what I was doing there is that the semantic layer, so if you think about a data mesh, what you're doing is in data mesh is really, it's not a technology, data mesh is an organizational style. So what you're saying is that I want to be able to distribute the data product creation down to the business. So again, finance and marketing would be creating their own data products. So how are you going to make sure that finance, marketing, shipping, for example, all the different departments are using the same language? Well, you can do that with a semantic layer and a data model. So where they fit is that I think that a semantic layer is critical to make a data mesh architecture work, because if you don't have a common language, then you're talking about it's a free for all. So the finance team is going to decide to roll up data in their own calendar, for example, or their own product hierarchy. You could have your merchandising team doing something different. You have your marketing team measuring revenue and revenue in a different way than finance. So what a semantic layer can do is it allows those data stewards in those different data domains to be able to create an author models so they can take their logic and their knowledge of those business processes, embed it in a data model, and then they can share that with other data domain teams so that now you can create a business process that is more than just that domain. That was the example I gave of combining a finance model and a marketing model to be able to calculate the return on investment for your Google search, your Google search terms, for example. That's an example of having to need data from finance and data from marketing and be able to merge those together. You can use that semantic layer to do that in a data mesh style. Perfect. So Dave, in your view, what is the difference between the semantic model and data virtualization? Great question. There's a lot of similarities. The main similarities in the semantic layer is that you can use data virtualization as a semantic layer, but it's a tabular semantic layer. And what I mean by that is that you're dealing with tables and views. The difference with at least the ad scale semantic layer is it's multi-dimensional. So coming back here to Power BI, what you can see here is that I actually have hierarchies here. So you can see that my geography dimension is made up of a roll-up with city, state, and country. You don't get that with a data virtualization tool because it's not multi-dimensional in nature. So the other things to be to be watchful for is how optimization is done in a data virtualization tool versus a semantic layer. You could see that with ad scale, we actually created aggregate tables in the data warehouse, in Snowflake in this case, with other sort of data virtualization tools. Typically, they'll create caches outside of the data platform in tools like a spark cluster or a patchy arrow cluster. And that then makes the customer have to provision different hardware to be able to manage performance rather than this relying on the data platform for that performance. But they're very similar as the short answer. I think that the ad scale semantic layer goes both much beyond data virtualization because it's a multi-dimensional engine at its core. Makes sense. This next question, Dave, we probably could do a whole webinar on. But if stewards are creating models, what is to avoid the wild west with stewards duplicating models or creating models that are only slightly different, resulting in an explosion of models making it difficult to find anything? Yeah, that's a great question. What I tell customers is that I tend to think of, I recommend rather than data mesh as pure in its pure form, I like to think of hub and spoke. And so I think you do need to have somebody running the ranch. And that could be a domain that owns a domain that's been designated as the steward as the chief steward, or it could be a central team that's the chief steward. But I think you do need to have somebody who's owning this library and deciding who can read and write in the different objects and to make sure that people are speaking the same language on the data models. The good news is that when you have a tool like a semantic layer, you now have one place with which to apply that normalization across different data teams and different data domains. If you don't have a semantic layer, it truly is free for all because people are going to create tables in their warehouses, and they're going to create reporting views, and then they're going to embed the logic into the actual visualization tools by creating models in those visualization tools. So you truly have no control if you don't have a semantic layer in that sort of data mesh sort of style of architecture. Perfect. So how does this work for data scientists since data science tools often require flat files? Yeah, great question. I should have given you a demo using my using a new Jupyter Notebook, but I'm going to give you a very quick show you about how it would look like to a data scientist or a developer. So I'm in here. This is a different tool. This is just an open source tool. That's called DB Visualizer. I'm just connected with JDBC in this case. So this is what at scale this semantic layer would look like to a developer or to a data scientist. They're just going to connect with JDBC. Now, also we support a Python interface, so I can also support Python. But you can see there's data sales inside Snowflake was my at scale project, and there it is. There's a Data Diversity Web Store. And so you can see here that these just appear as columns in this warehouse. So I can connect straight to it using SQL, and I can run SQL. So if I come back here and I come back and I just pull out a query here, let's go and pull out a query that I had in the very beginning here. So I don't have to try to handwrite it. I'll copy that query. I'll paste it into my JDBC tool and I'll run it. And there's the same data that you saw. I'm just running it in a query and I'm running it in JDBC. So the data scientists and the application developer are going to get access to that semantic layer without having to reinvent the wheel. But of course, it's going to be in a language SQL or Python that's familiar to them. And of course, I'm a big fan of the name of the Web Store there, Dave. Very much appreciate. I think we have time for a couple more questions here. So where, if anywhere, does traditional data modeling like Erwin fit into a semantic model? Yeah, that's a great question. You know what, I can tell you that when it comes to the semantic model, there's a lot of similarities between traditional data modeling like you do in a tool like Erwin. We have the ability to take and ingest different data models from the different BI tools. So if I come back here, I created a project from scratch. We have the ability to import XML from different tools into a model. We also have the ability to import analysis services models themselves and to create an at-scale model as well as importing from Tableau and the like. We don't support Erwin out of the box, but you could definitely do it through this XML. So I would really like to see in the future of having a better integration with tools like Erwin so that we can take that data model that we constructed and start from something that's already been modeled for me. So I think that's a really good addition to the at-scale engine and the at-scale platform, but today you can already do that and import those models using these different importers and transformers. Perfect. So is the semantic layer implemented using a graph database? It is using, it actually the query optimizer is graph-based, and that's a great question. So if you look at this and you look at all these relationships, this is a fairly simple model. It can get very, very complex, especially when you're looking at that embedding, like you saw here how I embedded different models and the like. So we definitely use a graph engine under the surface to create and create those best paths when we're creating those SQL queries on the fly. We also use that graph engine to actually create those aggregates. So you saw that I created some aggregates here today and at-scale just did that for me on the fly. And so if I just look at my snowflake and do that, here's these aggregates that I created just today, at-scale just did this automatically. It created these aggregates and it used them. So you can see the utilization. I used this one aggregate nine times just since I did this demo. And you can see the amount of time spent or saved based on using that aggregate. So there's some really great stuff here. All right. I think we have time. We have just under three minutes here for one more question. Should there be a business glossary before the semantic layer is implemented? Yeah. You know what? I think that and I think that's where if I come back to my screen here, that's where the data catalog is really important. And I think it's really great to have that data catalog and to have a place to define that glossary. And I think a semantic layer really benefits from that so that you can make sure that those data stewards are speaking the same language when they actually authored the models. So I do think that's a good idea. Now, we have integrations with the data catalog. So if you do choose to create your semantics in the semantic layer, then those definitions are automatically integrated into your data catalogs like Elation and Calibra, for example, so that we're speaking the same language across the enterprise. Perfect. And there's so many great questions here. I think I'm just going to sneak one more in if you don't mind. We've got just a couple of minutes. So what's the path from development dev environment to production environment? Is there integration with versioning tool? Yeah, that's great. Yeah. So this is just the semantic layer is just code. And so you integrate that directly into your CICD and you can see it's just coding here. And that gets integrated with GitHub and the like for doing your version control. We also have additional capabilities. We have the ability to create snapshots. I created a baseline snapshot because I do this demo a lot so I can then restore that snapshot. So there's a lot of different tools to support making sure these semantic models are consistent and controlled. There's also all kinds of security and access rights for who gets access to the different models and what you can do on those models. And those all integrate with your enterprise directory services. So you're not having to deal with a separate set of users in at scale or in the semantic layer. You can see there's David Mariani. That's me. That is my that's my active directory name. So no reason to deal with different users in your semantic layer. It needs to be integrated with your enterprise directory, which it is. Well, that is perfect, Dave. Thank you so much for this great presentation and demonstration. And thanks to our attendees for being so engaged in everything we do. There's so many additional great questions. I'll get those over to at scale to review. Again, just a reminder, I will send a follow-up email to all registrants by end of day Thursday with links to the slides and links to the recording as well. Dave, thank you so much. Thank you, Shannon. Thank you so much. And thank you, everybody, for listening. Thanks all. Have a great day.