 Hello and welcome. My name is Shannon Kemp, and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining today's DM Radio webinar all about the analytics. The revolution is underway, sponsored today by Looker. It is a deep dive in continuing conversation from a live DM Radio broadcast a few weeks ago, which if you missed it, you can listen to it on demand at dmradio.biz under podcast. Just a couple of points to get us started. There's a large number of people that attend these sessions. You will be muted during the webinar. For questions, we will be collecting them by the Q&A section in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag DM Radio. Now let me turn the webinar over to Eric Kavanaugh, the host of DM Radio, to introduce today's webinar and today's speaker, Eric. Hello and welcome. All right, ladies and gentlemen, hello and welcome to the DM Radio deep dive. Yes, indeed. My name is Eric Kavanaugh. I'm going to talk all about analytics today. The revolution is underway. Here are our speakers. Good buddy of mine, Daniel Mintz, Chief Data Evangelist at Looker. This guy really knows his stuff and there's a slide about yours truly. And enough about us. Look at this. Oh, my goodness gracious. This is the MarTech 5000. About 10 years ago it was the MarTech 300 or so last year with the MarTech 4000. These are just marketing technologies used to automate marketing activities. 5,000 companies. Now, of course, we talk all about analytics on DM Radio and at Data Diversity. We're always trying to figure out how companies can do a better job of analyzing their data, understanding what does the customer want? What's the customer doing? How are sales? How is our marketing performing? Of course, this is just marketing companies, right? We're also, in the real world, trying to do financial assessments, trying to do projecting and forecasting, trying to understand the big picture in overall markets. There's a lot of data from a lot of systems that needs to be analyzed to do that. Well, if you just think about this for a second, 5,000 different companies doing different kinds of marketing automation, whether it's tracking impressions, whether it's tracking social media, whatever the case may be, that's a lot of moving parts. So in the old world of data warehousing, and we'll talk about this in a minute, there were some pretty effective ways of being able to marshal data into a data warehouse and then analyze that data and get some crystal clear view or relatively clear view. But the fact is that's getting harder and harder by the day. So I think that what we're seeing now is more of a transformational period of time where data warehousing is yielding to some new methods for being able to reach across multiple systems and find different ways to synthesize that information and to give that fairly clear view to the business about what is actually happening. Well, if you think about all the different data models that are in play here, I do a lot of marketing myself, and just between eye contact and constant contact, you've got wildly different data models. You have different mechanisms of action. You have different mechanisms for tracking, clicks, versus opens and so on and so forth. And let's face it, there are a lot of smart people out there who are always trying to prevent tracking or trying to come up with new ways to track, and there are lots of different ways to do that. So how can we solve this problem? We're going to learn about that today. And very quickly, just to let you know, DeepDot is all about the technology, really trying to understand what this stuff is, what it does, how it works, why it does things a certain way. Because you as the end users, you're the ones who have to figure this stuff out. You're the ones who have to answer to the business to explain what's really going on. And so you need to know what these technologies do and how they work. So this is just a marketing world. And then I'm going to show you a slide that I've actually pilfered from my friend, Rick Sherman, Accidental Architectures. So it's kind of hard to govern a situation like this. And this is an actual diagram from a customer that has been grayed out here a little bit so that we've hidden the names to protect the innocent, so to speak. You cannot govern this kind of environment, typically. And what we've done in the history of data management is, again, we pulled data, usually via ETL, into a data warehouse and then analyzed that data. Well, there are ways to achieve governance in the history of our business. You could do it at the source level, where the database is, where the data exists. You could do it at the access level. That's pretty hard to do, quite frankly. But if that's changing, but there are some new ways of solving some of these problems, that's what we're going to find out today from our good friend Daniel Mintz of Looker. So let's think about this. Can data be governed? Well, how can it be governed? Data warehousing has a lot of movement. We talked about that. ETL tends to be kind of opaque and fairly brittle, right? The extract transform load of days gone by. Data federation has been around for a long time, but really only recently has it been a viable mechanism for getting access for doing federated queries, for example. That's part and parcel to what we'll discuss today as well. So one of the trends that we're starting to see is just leave data where it is. You know, on DM radio over the years, and this goes back 10 years now, when I was fairly new to this industry, I would ask companies, when are we going to stop the madness of moving data around and around and around? Because, again, it's hard to tell what's going on when you do that much data movement. And I think a lot of the seasoned professionals out there on this call know, once an ETL process is in place, stopping it is kind of a difficult challenge. First of all, people move through organizations. They leave. Some of these environments are incredibly complex, where you'll have literally thousands of ETL scripts running. You have these batch windows that you have to hit. They're all kinds of technologies for expediting the ETL of that data. But again, you're just going around and around and around. When is the music going to stop? Someone's going to lose a chair, and we're all going to figure out what's really happening, right? So that's the musical chairs of data movement, I would say. So federated queries are coming. Modern analytics really needs to span lots of different data types to the point of this slide earlier. 5,000 health companies are not using anywhere near that many. I doubt anyone's using them all. But some big companies do use quite a few. You get 100, 200 different systems that have data that's important to understand and to analyze. So what's the key? You have to be able to reconcile the data from these different systems. And that's what we're going to learn about now from our good friend Daniel Mintz from Looker. And Daniel, welcome to the show. Thanks for doing the deep dive and show us what you got. Thanks, Eric. My pleasure. And really, there's frankly nothing happier, nothing that makes me happier than doing deep dives. I love that stuff. I love talking about this stuff in detail. And so, you know, I'm really happy to be here. And let me share my screen so you all can see what I see. So I'm, you know, in the spirit of doing a deep dive, can you guys see my screen now? Are we good? I can now. Just shut up. Okay, great. So in the spirit of the deep dive, I'm going to keep the slides quite brief so I can show you the technology that drives Looker makes Looker work. So, you know, I think to Eric, that was a great setup. I think the reality is just about everybody's analytic tools are outdated because the sort of underlying needs are moving so fast, you know, and I think the reality of most BI tools, most business intelligence tools, most analytic tools is that they were architected for a very different world. They couldn't possibly have been architected for this world when no one really saw this complexity coming. You know, and I think the choices that a lot of those tools made were great choices given the constraints of the day that they were built, but, you know, that architecture doesn't necessarily fit what's going on today. So, you know, what, you know, that core change, I think the thing that you most need to be aware of is that the driving constraint that drove the way that everything from business objects to Tableau to quick click to MicroStrategy were built was that back in the day, you know, up until maybe, I don't know, 7, 8, 10 years ago, databases were incredibly expensive and they were quite slow. You know, they, you needed to shrink your data because you couldn't possibly deal with the data and all its complexity, all its size. And that, you know, that was in a world of much smaller data than the data today. And of course everything that you did was about data because you didn't want to buy another server to put in your, you know, your server farm because it would cost you a million dollars, you know, and it'd be trucked in on an 18-wheeler. And so, you know, that was the world that these things were architected for, but that is not the world of today. Databases today are incredibly fast. They're almost unthinkably fast. They're, they cost pennies. Sometimes they cost less than pennies, you know, most of the big cloud vendors will give you free queries each month, you know. So that world that drove the development of those technologies is not today's world. And so that means that we need new technologies that can fully grapple with and take advantage of these new sort of underlying data engines. You know, and I think that starts with the Hadoop Revolution, you know, 10-plus years ago and continues through with Amazon Redshift and, you know, Google BigQuery, you know, Snowflake, Vertica, now there's, you know, Spark and Impala and Hive and, you know, there are a ton of these data engines out there, databases, data warehouses, but the reality is all of them share one key characteristic, which is that they're cheap and fast. And so what does it look like to sort of architect a system on top of one of these databases that fully leverages all of that power? And, you know, I think when you look across the spectrum, what you see is sort of two main responses to these databases. When you look at most of the solutions that were architected in these last 10 years, one is to say, well, I guess we can just write SQL against this database. You know, that'll work, that's great, no problem. They all more or less speak SQL and that gives us direct access to all that power. Or we can sort of hook a black box up to it and, you know, it'll magically transform your data and take it from that messy MarTech slide that Eric showed to a beautiful clean dashboard. But I'm here to sort of argue that neither of those is actually an adequate response because when you go with the SQL response, what happens is only the analysts who speak SQL can access the data. And so everybody else is stuck waiting in line for access to that data. That doesn't make the analyst very happy because they spend all day writing and rewriting versions of the same query and it certainly doesn't make the business folks very happy because they have to wait in line and wait in line and wait in line and eventually they just stop asking because they know they're not going to get the data in a timely manner that they need to make a better decision. So why bother? On the other hand, if you've got that black box, it seems fine. You've got access to the data probably, you know, directly without having to go through the analyst. But what happens is because it's a black box you can't really understand precisely how the data is being transformed, precisely how it's being shaped and that is all well and good until it's not, right? Until you get into a meeting and you say, oh, sales are off and somebody else says, no, they're not. They're down. And the third person says, no, they're flat. And because it's a black box, you can't really interrogate the data. You can't understand why you're getting different answers. And it might be because you're pointed at different data sources. It might be because you're using different business logic. But whatever it is, you can't figure it out because it's a black box. And so, you know, I don't think that either of those responses, both of which are trying to leverage this new technology are adequate. And, you know, on the side of SQL, I think, you know, as somebody who knows and loves SQL, I can say that SQL is pretty amazing. You know, it's everywhere. It's truly ubiquitous. It's quite functional. It's proven. It's been around for 40 years. It really runs the world in a lot of ways. And, you know, despite Hadoop and Java's sort of attempt to displace it, I think SQL is very much making a comeback as we see with all of the SQL on Hadoop implementations and then the fact that all of the sort of newest MPP databases are using some version of SQL. But as somebody who really loves SQL, I'm also comfortable saying that it is quite terrible in a lot of ways. You know, it's kind of a right only language. I know that when I write SQL, if I come back to it a week or two later and try to make sense of my own SQL, I often can't, so I start from scratch. You know, it's not easy to learn. It's very easy to write bad SQL, even if you're really good at SQL. You know, I always think back to my first job, writing SQL, where there was an orders table and there was a field in the orders table called status. And if you wanted to access the right orders, you needed to say status equals completed. And if you forgot to do that, you'd get all the charged backs and the failed credit card transactions. And so it was just really easy to forget that little where clause. And what that meant was, you know, you might run a query, forget that where clause and think, my God, we had an amazing day yesterday, only to realize, of course, you were counting all the failed credit card transactions and be very sad when you fixed it. But, you know, the point is simply that it's easy to write bad SQL, even if you're good at it. And it's certainly not business friendly. You know, as anybody who's tried to say, well, just everybody at our company will learn SQL that hits a brick wall pretty quickly. And so the perspective that Looker comes at this from is, you know, that we want all of that power, that flexibility and the ability to interrogate the data and the analytics and the business logic that SQL gives us, but we want it without the black box. And so the sort of key insight that Looker came up with is this thing that we call LookML. It's really the core secret sauce, we're not so secret sauce, I guess, at the heart of Looker. And what we realized, what Lloyd, our founder, who's been in the data world for a really long time, he was a languages and database architect at Borland and was the chief architect on Netscape Navigator Gold. So he's been around for a long time. But what he realized is, you know, we want to put data in the hands of everyday business users, but we want to do it in a way that leverages the analyst's knowledge about what the data means. And we want to take the analyst out of that role of having to rewrite every little SQL query. So what he realized is that we can actually reduce any query to four key components, right? One is sort of what table are we looking at and what other tables join to it and how. The second is what fields do we need to select. The third is how are we going to filter the data. And the fourth is how are we going to sort the data. And if you have those four things, you really can construct sort of any query, right? And, you know, I think base view just means, you know, if you've got an orders table, it's always going to join to the user table using user ID. It's always going to join to the order details table using order ID. And so we know that we should be able to write that once and then leverage that. And, you know, and we can sort of go from there to construct any query. And once we realize we can do that, we don't have to have a person writing that SQL every single time. They can sort of put that knowledge about how the data is structured into the software once and then let the software do it, take it from there. And what that gives us, you know, is the power of the flexibility and the provenness of SQL but with little things like version control and collaboration and extensibility and modularity. So, you know, that idea, LookML, is really at the heart of Looker. And don't worry, I won't just tell you about it. I will actually show you how it works because I know as an analyst myself, I'm very skeptical of people just telling me about, you know, magical things that'll make my life easier. So I'm certainly not going to do that. But before I do that, let me just show you one last thing, which is just sort of how Looker uses LookML and how it's architected. So, you know, as Eric said, people have data coming in from a million places. Most of Looker's customers have, you know, SaaS apps that they're collecting data from, a transactional database and ERP, web analytics. And as Eric said, you know, it's a lot simpler to sort of federate that data or dump it all into one place than it ever was before. You don't have to do that heavyweight ETL. And so Looker takes advantage of that by, you know, we have lots of partners who help folks dump that data into one database, any SQL database or a few databases. And then Looker doesn't suck the data out, it just sits on top of that database. And it uses that agile modeling layer that's built on LookML to write the correct SQL, send it to the database and get the results back. And because Looker's a data platform, it also does some key things like, you know, version control that modeling layer. It manages all your connections. It manages your users and security. And once you've got that sort of model of what your data looks like in Looker, you can access it in a variety of ways. So most of our customers will use our web interface to sort of self-service exploration and visualization. You know, we use D3 libraries to drive our visualization. They can export the data if they want to. They can schedule delivery so they can get it by email or a web hook or push it to an S3 bucket or an FTP server. They can embed that data in Salesforce or an iframe or via JavaScript. And, you know, we have a full restful API so you can do all the stuff you could do manually, programmatically through the API. And, you know, the key idea here is because all of these operations are happening through this platform and accessing the database not raw, but through this modeling layer, it means that you're always going to get the save answer to your business question no matter how you ask it. And that idea of trust is really sort of critical and we think too often missing from the world of data. So enough talking and telling you about Looker. Let me actually show you how this works. So this, you know, Looker runs entirely in your browser. I don't normally get to demo the sort of innards of Looker so I'm very excited to get to do that. So let me show you actually how LookML works. So to make this sort of simple and straightforward, I'm going to use a really simple data source. It is a table of names that the Social Security Administration maintains that shows, you know, how many people of each name were born in different states from 1910 to 2013. And so the first thing that you would do in Looker is you would connect it to your database. In this case, I'm going to use the BigQuery database. And Looker will then, you know, read the table schema and construct this very basic model on top of it. This is LookML that we're looking at right here. And, you know, here we have our five fields, name, gender, state, year, and population. And we'll also build a real simple count called, or a real simple measure called count. And, you know, this is not terribly interesting, I'll admit. But, you know, I think the key idea here is we can explore this. So what we're looking at here is very much what an analyst or a developer would look at. And what that allows Looker to present to business users is a much friendlier interface that looks like this. So I'm, you know, picking dimensions and measures. And Looker then is constructing SQL to send off to the database, and I can then run that SQL, get my results, and display my results here. So here's, you know, there were 103 Arlene's born, who are female, great, not very interesting. I will readily admit. And, you know, the key thing to know here is that the SQL that's being produced is dynamic. So if I add state, Looker's going to add state. If I add count, Looker's going to add account. And, you know, all of this is very readable. It is not a scary machine generated SQL that's impossible to parse. This is just, you know, pretty straightforward. But I will admit not terribly interesting. So let's go back to the code here. And we're going to go to a separate file. And one of the key concepts of Looker is, and LookML is reusability and extensibility. So I had defined those six fields in my last file. I don't want to have to rewrite all that. So I'm going to just say, you know, extend name step zero. And I'm going to just pick up those six definitions, not have to rewrite them. I just get them for free. And then I'm going to continue. And because I want to be able to make changes to these files, rather than just looking at them, I'm going to switch into developer mode. So this is, you know, this key idea of version control, which engineers are quite familiar with. Analysts, unfortunately, have not mostly benefited from that knowledge that, hey, it's a good idea to keep track of everything we've done. So I am now in developer mode. Looker uses git to version control everything that you do in LookML. So I'm on my personal branch. Everything that I do in here now will affect me and no one else, which is really nice because it means I can mess around, I can break stuff and not have to worry about breaking anybody else's stuff. So I've got those six basic fields. I'm going to add this new measure called name count. And it's a type count distinct. So I'm going to count the name field, count how many distinct names. And I'm also going to create a total population measure, which is a sum of population. So it's still not super interesting, but let me, let's explore this. And we'll see now that I've got these new fields. Here's my name count and my total population. So let's look at, you know, how many names were in use by per year. And again, Looker here is writing the sequel, very straightforward sequel. You know, let's take years and count how many distinct names were in use. We can sort this and we can visualize it. So let's see, oh, that's interesting. So there are a lot more names in use today or in 2013 at least than there were in say 1910. Well, you know, we could do more here. So let's say, you know, I wonder if it's mostly girls or boys who are driving this change. So we can add gender to this table and Looker will add that gender field. But it would be nice to pivot gender, you know, so that we don't have it like that. So I'm going to do that. And normally you would not want to write pivots in sequel because, you know, it's kind of nasty to write pivots in sequel. But luckily you're not writing the sequel. Looker is writing the sequel. So Looker will take care of writing that pivot. I admit it's a little bit gnarlier, but you're probably glad now that you're not writing it. And now I'm going to run this query. And now we'll be able to see is it mostly girls or boys who are driving the rise in the number of distinct names that are in use. We can visualize this. And what do you know, it's girls, mostly who are driving it. Boys are certainly up as well, but girls, more distinct names in use, right? So, you know, we're, I wouldn't say that this is groundbreaking discoveries, but we're certainly starting to learn things from a very simple table, you know, by writing some nice simple look at them. So let's jump back to the code and go further. So let's jump to name step two. So again, I don't want to have to rewrite everything I've already written. So I'm just going to inherit that by extending the stuff I've already done. And, you know, maybe I want to take that year field and I want to get a decade from it. Well, you know, that's pretty straightforward to do in sequel, right? I can take the year, I can divide it by 10, take the floor and then multiply it by 10. That's a pretty simple sequel operation. And because LookML is built right on top of sequel, it's not trying to replace it. I can actually just write that directly in the LookML and that gives me decade. I can also do things like case statements. Like I might want to take the states that I have in the data and, you know, make regions. So I can say, well, you know, if the state is any of these, then let's call that west. And if it's any of these, let's call it southwest and so on and so forth. And so I can now explore this and maybe we'll use Eric's name here. So we can say, you know, let's filter to Eric and let's look at, you know, by region and by year. You know, we'll pivot region. How many Erics were born in each region of the U.S.? And now we can look at the sequel that Looker is writing. And, okay, this is definitely not sequel that you would want to write by hand anymore, right? And so now we can visualize this and we can see, let's look at a line graph and we can see, oh, okay. So Eric was not really a name in widespread usage in the earliest part of the 20th century. But come the 1970s and 1980s, Eric, I don't want to, you know, make a guess about what year you were born. I don't want to date you. But, you know, we can see when it rose and that it's been tapering off ever since and that it, you know, kind of got up to its highest in the Midwest. So maybe Eric was born in 1970 in Kansas. I don't know. But, you know, we're starting to understand a little bit about this data and we're doing it all via this sequel, which is really nice. And the thing about being in developer mode here is I can make changes that, as I said, only affect me. So let's say that I decide, you know, Utah really isn't a Western state. I want Utah to be a Southwest state. I can add Utah there. I can save this and now I can go back to here. And when I rerun this query, and I'll show you before I run it, right now Utah is showing up as a Western state. But let me just refresh this window. And if I look at the sequel, Utah is gone and there it is as a Southwestern state. So this ability to sort of interrogate what's going on with the data to, you know, to change it on the fly and maintain agility I think is really critical to being able to sort of learn what's going on with your data for real. And so that's very much what LookML allows you to do. And we can keep going and, you know, let's say we want to create a population, a male population measure so we can say, well, I want to measure called male population that sums the population field but uses this filter where the field gender is a value M, right? And so that gives us a male population field. And then we don't have to rewrite that logic to make use of it. So let's say we also want to have a percentage male field. We can define that using fields that have already been defined. So this male population here is just referring to this measure that I defined here. And if I make any changes to this, this definition, it will flow through to this one. And that idea of not repeating yourself of reusability, you know, is something that's core to modern software development but is not really core to sequel. But it is core to LookML. And so, you know, I can then, I can do the same for Northeast population. I can do the same for before 1940 population. And then I can leverage all these on the front end. You know, and Looker makes it really easy to write LookML because it's got this IDE that I'm in right here. And so I can, you know, if I wanted to define a new measure, you know, I said, you know, let's create a measure called female population. Sorry, WebEx is making my computer a wee bit slow, as you may see. And down the left-hand side there are those versions because you talked about version control. So those are different views. That's a good question. So those are different views that are showing because of the way that I'm doing this by just stepping through. Often this, all of that logic would be in one file or a couple of files. But here, we've broken it out. But I'll show you sort of what you can do with views because it's not just about referencing in a table that exists as a physical table in your database. You can go further than that. But let's say I've got this, you know, this male population. I type some, or excuse me, I type type. And then Looker will tell me, the IDE will tell me, you know, oh, yes, I can make a sum or some distinct. And, you know, I want my SQL to be, to reference that population field that I created way back when. And so Looker is giving me all my options at sort of auto completing, which is very nice and makes it a lot easier to write. Look them out. You know, this is a proper IDE. It's a proper language. And so it should have a development environment that makes this easy. So, you know, when I've defined my filters, I'll, oops, very hard to say. And those are basically guardrails, too, right? Absolutely. They keep you on track and they show you what's possible with that particular type at that time, right? And so if I do something stupid, like, you know, say this field, I should reference the field error K, Looker is immediately telling me, hey, nope, that doesn't exist. Don't do that. And so that, you know, that error checking is great. It makes it much easier not to make those mistakes. So now I can say, you know, gender and value female. And now, you know, just by doing that, I have this, oops, let's put it inside the quotes rather than outside the quotes. Now I have this new measure called female population. And if I save this or female poolation, let's fix that. There we go. And so if I save that and go back to my Explorer, immediately I have, there's a female population available to me. And Looker knows that the way to define that is as a, where gender equals F. And it's even doing the upper to make sure that there's no problems with, you know, oh, it's a lowercase F. So Looker creates those guardrails, makes it much easier. I don't have to worry that the SQL is written correctly because I know that it's doing it. And as a business user, I don't have to worry about this at all. And for lots of business users, that SQL tab isn't even available, right? And so that's really nice because it means I don't have to worry about it. But if I, you know, I'm an analyst and I'm saying, you know, that female, that data doesn't look quite right. Interrogating that field and understanding how it's being calculated is as easy as a click. I click that button, takes me directly to the Look ML that defines it and I can figure out exactly what's called. Oh, wow. And so, you know, this, we can see that there's our baby boom in the 40s and 50s, right? And then it's sort of leveled off. But, you know, this idea that I think when you look at, you know, sort of computer programming languages, general purpose programming languages, there's been enormous growth, right? We're not still mostly writing C. You know, sometimes you do need to write C, but most of the time you can write Java or Python or Ruby or whatever, you know, you want. And those things are, they're sort of higher level languages that take care of a lot of the low level stuff for you. So you can focus on writing, you know, a good program, something that does what you need it to do, not worry about garbage collection like you would have to in C. But for some reason in Data World, we've gotten stuck and we're still mostly writing SQL. And SQL is great. It's powerful, but it's pretty low level. And so what that means is we're mostly still worrying about, oh, did I remember to include the status field or did I forget that? When in fact that's something that you should be able to sort of outsource to a computer and say, hey, by default, I want to say status equals completed and I should be able to take that off. But, you know, I don't want to have to worry that I have to remember it every single time. Yeah, that's a really good point you just made there about SQL being a very low level language. If you think about Python, for example, these are higher level languages, meta languages basically that are then generating code themselves that go down to the C level of whatever the case may be. And this is, I think this is an interesting step in the right direction, creating a meta language for being able to optimize. And that's one of the things my good buddy Mark Madsen said on the webcast a couple of years ago. If you're not using one of these higher level languages, you can't optimize very well, right? That's right, yeah. And so, you know, as an analyst, you know, if I'm dealing with something slightly more complicated than a five field table, you know, there are lots of those optimizations that I might want to actually live right in the sort of model, the data model, you know, things that I know about ways that join should be done, stuff like that, that I don't want to have to transmit that knowledge person to person to my whole organization or to everybody who's writing SQL because that's really inefficient. And so, if I can put it in software, that scales that knowledge very quickly. And, you know, it keeps it agile, makes it easy to change if I need to, but it, you know, it doesn't require me to sit down with every single analyst and say, by the way, if you do this join, make sure to, you know, add this where clause because otherwise your query is going to kill the database. And, you know, to go back to your question about sort of what's going on down the left side here, you know, in LookML, there are really two main kinds of files. There's also dashboard files, which is a third kind, which is that you can actually use code to define a dashboard, but let's not go into that yet. So the, you know, what I was showing you mostly, these are called view files. So what this is, is it either takes a table exactly as it exists in the database, which is what this does, this popular names table, and then it defines dimensions and measures based on that. Or you can actually create what we call a derived table. So I can, rather than referencing a table directly, I can say, you know, I want a cohort, a concept of cohort. And so I'm going to create a derived table, the definition of which is take each year, state and gender, and tell me how many people were born in that year of that gender in that state. And then I can build dimensions and measures directly off of that. And so, you know, that becomes very powerful because you can start doing transformations. You know, it's not only I have a user's table, I might want to have a user fact table, which is not, you know, data that should be represented directly in the database necessarily, but I can derive from that data. And not only can you define these drive tables, but you can also persist them. Looker will persist them back into a scratch scheme in the database. So again, that core concept of leveraging the power of the underlying database, you know, rather than pulling them into our own proprietary SQL engine or data engine, you know, I don't really want to compete with Google and Amazon and who can build the fastest data warehouse. I'm really happy to let them have that fight because they are both amazing at it and only drive each other to be better. I want to leverage what they're doing. And so that's how Looker is architected to leave the data in these databases because they're so powerful. And so that's the view file, right? It defines individual fields. And the other type of file, the other main type of file is a model. And that's that idea of, well, how do the tables relate to each other? And so, you know, I can define joins here. So I can say, all right, that cohort table that I just created, that cohort view that I just created, I want to join it back onto the main table. And I want to do it where state equals state and year equals year and gender equals gender. And I can tell it's a many to one relationship. And what that lets me do then is I can explore this data. And from the user's perspective, there's no difference between the derived tables that I am calculating on the fly and the ones that are actually in the database. So I can say, you know, let's see, how many of the boys, you know, born in each year were Daniels, right? And so I'm just picking and choosing on the fly. This is not a query I think I've ever run before, but that's okay. Looker knows how to construct this equal. It writes it. I can sort it. And I can see, you know, here are our peaks percentage of boys born in any given year that were Daniels, right? And so again, you know, it just gives you enormous flexibility to sort of explore freely. And just to show you a little bit more, you know, I'm intentionally using a very sort of simple data set because it lets me really show off the power of LookML, but just to jump over to one of the demo, larger demo data sets that we use regularly, you know, I can then take all of this and build up, you know, a business pulse, let's say, that's a full-fledged dashboard leveraging all this data. But again, each of these tiles is running a query against our database. And so what that means is I can always drill down to real-level detail. So if I want to drill into this data set, you know, I can show every single transaction that added up to $544,000 of revenue this month, you know, because I'm directly connected to the database. And I can then, you know, even say, oh, this, you know, this user, I want to, you know, email a promotion to her, you know, and that leverages some of the more sort of platformy pieces of Looker. But, you know, or I want to look this user up and I can do all of this because I'm sitting directly on top of the database. I can see what her lifetime spend is. I can see, you know, what she's ordered. And let's say, you know, that lifetime spend seems weird. Maybe I don't trust it. I want to know how that's calculated. Well, it's a sale price. So I can, again, jump right into the LookML, interrogate it, understand exactly, you know, what's being used to do that. Oh, okay. It's a measure of type sum in U.S. dollars. And it's using the sale price. Well, okay, I want to see how sale price is defined. All right, let's jump to sale price. Oops, let's go the other way. So there's sale price. Oh, okay, sale price is a field directly in the table. So that's where sale price is coming from. You know, and this idea of reusability. Here's my dimension sale price. I want to calculate a gross margin while gross margin is sale price minus cost. Great. I define that. I define each of those each time, you know, or more complex measures like, you know, let's say I want to look at, you know, shipping time. Well, I can take the, you know, I can date, if the shipping date and the delivery date and, you know, get the number of days or I want to look at days to process. And I know that depending on where that order is in status, it's going to be a different calculation. So, you know, if it's still processing, I want to know the difference between now and when it was created. If it's been shipped or completed or returned, I want to know the difference between when it was shipped and when it was created. If it was canceled, then I don't care about it. And so, you know, this idea of reusability, this idea of taking SQL, which is often these giant blocks of code that you really can't interrogate or understand and turning them into these little bite size definitions that are reusable. That's really the core of LookML. Again, you're also, you're kind of speaking to the data governance side of the equation here, which we talked about at the top of the hour. Being able to understand where information comes from, how are these fields calculated, having the ability to drill down from a dashboard all the way through to the code for how these calculations are being done. That's how you do governance, right? Well, that's at least how you understand lineage and how you can solve problems. And because you've got this marshaling area, then you can solve a problem once and haven't solved instead of having to try to fix something with like a data quality tool or, for example, multiple times, right? That's absolutely right. Right. And, you know, and I think people get into trouble because they wrap up the business logic and the data into one sort of inextricable package. And that is fine for now, but then when you need to go take those two things apart because the business logic changed, now you're in trouble, right? And so by removing the actual business logic from the data and saying, we will leave the data raw, but we will put a very lightweight agile layer of business logic on top of it that stays separate. It means that I keep that agility. I can make changes to the business logic on the fly, have them reflected, but not mess up that raw data because that raw data is important that it stay raw, right? Because it means that I can understand exactly what's going on with it. Yeah, this kind of gets into another one of the issues that we wanted to talk about on the show today, which is the old ways of doing data warehousing. We have to remember there were a whole set of different constraints, mainly the pipes were small, the processors were slow, storage was expensive. So we made decisions based upon those constraints to strip out context to get just the meat, if you will, or the, you know, the tiny little thread of data from a source system into the warehouse, normalize it, then be able to analyze it. And what you're talking about now is not stripping out that context, leaving it where it is, such that someone can, like an auditor, for example, or even an employee who comes in kind of in the middle of things, can go through the system and see, all right, well, what was the actual raw data? That enables transparency. That enables, I think, an element of robustness that we have not really seen in the data warehousing era. What do you think? Yeah, absolutely. And, you know, to reiterate, I think, you know, I think the, the, this isn't because, oh, well, Looker came up with this whole new way that no one ever thought of before. It's instead that that the previous solutions were built in a world where this simply wasn't possible. Right. I don't think that Looker works 10 years ago, because the databases weren't up to the challenge. Right. You know, a lot of the DBA's job was protecting the data warehouse from the employees. It was about how do we cube this data? How do we, you know, pre-summarize it so that people aren't hitting the raw data, because if they do, they'll take down the warehouse. And that's simply not true anymore. You know, I think, you know, when you watch a Google or run a query on a petabyte of data and it returns in, you know, in seconds, that's a very different world than the world of, you know, Hyperion and Oracle and Teradata. Right. I mean, it's just, it's, it's insane. But it does mean that you need to rethink some of the decisions that you made because those constraints that drove those decisions are not there anymore. That's right. So, I mean, I can keep poking around. I mean, I don't know if there are questions from the audience. I love talking about this stuff. But, you know, I feel like I feel very privileged to get to talk to technical folks and get to show off the behind the scenes stuff. I love this stuff. I'm happy to talk about it all day. But, you know, I don't know. Yeah, maybe if you want to go back to the diagram that kind of shows the big picture. Yeah, sure. You just said there's a second note. Yeah, now that you've sort of seen what it looks like, you know, I think this, there are certainly, you know, data lakes and data swamps. But I think what happened with that idea of a data lake was that it was a good idea, maybe, but it also was like, in some respects, I think it was IT kicking the can down the road and saying, oh yeah, just dump it in the data lake and we'll deal with it later. We'll give it meaning later. But that rarely happened. And so I think the promise of having all your data available in one place often didn't deliver. But I think these days it's a lot easier, right? You can, in fact, get meaning out of that data. It doesn't have to be transformed ahead of time. You know, it does let you sort of clean up the data swamp because the databases are incredibly powerful, incredibly cheap, and also can deal with all kinds of data that in the past were not okay, right? You know, you look at Snowflake or BigQuery or Spark or Redshift and they have the ability to deal with semi-structured data. They have the ability to deal with, you know, with very flat wide tables. They have the ability to deal with all these things that weren't okay in the past because the database couldn't deal with them. And so, you know, I think that means that things that used to be in that data lake, in that data swamp, and, you know, we're going to stay there and never be used, now can be used, right? I mean, I think I talked about the Google Analytics data and how it comes into BigQuery when Google types it in raw. It's deeply nested, right? Well, who, you know, nested data in a relational database? Well, yeah, BigQuery has nesting. That's part of BigQuery's architecture, and so they can nest data. And so you can have, you know, a session event that has a bunch of page views and click events nested inside of it. And, you know, and that's fine because BigQuery can handle it and because Looker knows how to handle that. And so I can, you know, for example, go to our blocks directory and I can show you the sort of pre-built analytics solutions that Looker uses to deal with that. So, you know, I can go in, I think it's an analytic block. Let's see. So if we look for Google, it's a source block. So let's, you know, here's the GA Google Analytics Premium by Google Block. So what this is, is it's, you know, we saw all these customers using this data source. We knew that it was not easy to make sense of. And so we wrote the code that makes sense of it, built these dashboards, and now we can make this available for free, right? This is just code that lives in GitHub. And so I can say, oh, I want to see the Look ML code that unnests and deals with all of this. You don't have to write this. We've already dealt with it, right? And so when you get Looker and you have this GA data, you don't have to start from zero. You start from 90% of the way there and 95% of the way there. But again, because it's just code, you can then customize it and do what you need to with it. But we've already calculated all this stuff. We've already built these dashboards, so you're starting from nearly done, right? And that's one of the really nice things about Look ML as well, right, is because it's reusable, it's not just that you can reuse your own code, you can use other people's code, too. That's a really good point. Yeah, and we have a good question from the audience. An attendee is asking, can you explain how to join different data sources like MS Dynamics, SQL Server, et cetera, for business users to analyze? Yeah, absolutely. So we kind of handle that in two different ways. So one way to jump over to the sort of admin panel. So Looker supports, I believe it's now 35 different dialects of SQL. So if I wanted to create a new connection, you know, I'd open up the new connection dialogue, and here are the 35 different dialects of SQL that Looker supports. So everything from, you know, Apache Spark and Redshift to SQL Server 2005 and DB2. And so you can create those different connections and build different models on top of those, and then build dashboards where, you know, one tile is pointing at SQL Server data, and the one right next to it is pointing at BigQuery data, and that's no problem. But in the most recent release, Looker 5, we've gone further, and now you can actually join that data on the front end. So you could have data that lives in two different places, two different databases, run those queries on two different things. I'm trying to think if I have a good example here. I'm not sure that I do. Let me see if I can do it in demo where demo go. Here we go. And so let's say that I, you know, I'm exploring data here. I've got this here, and I can then, as a business user, without having to write any code, I can, oh, this may not be on Looker 5, bummer. Okay, this is not on Looker 5, but I think this is, let's see. I can, if I switch out of development mode. Oh, no, I can't do it. Sorry, I lied. That's okay. But look at the latest release, which neither of these instances is on, unfortunately, allows you to merge those datasets. So what that means is I can run a query on, let's say, Google Analytics data in my BigQuery database and a query on sales in my SQL Server database, you know, and see, I want to see total sales by day, and I want to see, you know, website visits by day. Those live in two totally different databases. As a business user, I can then merge those two datasets and build, you know, visualizations and dashboards and calculations based on those two measures, and Looker will infer the join and say, well, you've both these are grouped by day, so, you know, obviously you want to join those by day, and it'll take care of that for you. So it's a great question. And, you know, I think part of our approach is like there are some things that need to be done in the model and should be done in the model, but some things like that, where it's an ad hoc, you know, join across databases, that probably shouldn't be modeled. It's, but it's something that a business user should be able to do on their own without having to go into the code. Yeah, I know that's, that was one of my other questions, really, is the, you know, separating out the business view versus the developer view, because these are very different worlds, and what you're talking about is the ability to allow business users to point to all kinds of different data sources, and that's why I had that mark text slide at the top of the hour here, is to show, you know, even if you have 20 of those different systems, they're all going to have a slightly different data model, they're all going to have a different API these days, and you need some way, someone to basically handle some of that pain for you, such that you're not trying to manage APIs across multiple different systems, you've got this platform that sits on top of all these different environments and allows you to pull it in. And to me, that's the transformation that's taking place, and it's a very different world. You know, if you think about the mindset of what the IT people would do, of what the data warehousing team would do, it's a different set of challenges now, it's a different set of processes, and you kind of have to let go of the old constraints and the old ways of thinking about things in order to take advantage of this stuff, right? Yeah, absolutely. I mean, as a business user, I frankly shouldn't have to think about whether a piece of data came from MailChimp or NetSuite or our transactional database. I've got better things to worry about, and that should not be one of the things that I'm thinking about. So, you know, I think to your point of federation and virtualization and a single platform, a single place where you go, you shouldn't have to worry about that because the data should just be there, right? You should just be able to search and say, oh, I want this field, or I want this report, I want this dashboard, and not have to worry about the fact that this tile came from one source and this other tile came from another source. And I think that's what a real data platform like Looker gives you. Yeah, that's a good point. And, you know, API management is the one that I'm very curious about, but we have another good question from the audience. Let's see. Can I use the view created in Looker as a source to any other BI tool like Tableau or ClipView? How does that work? Yeah, so we, you know, as shown on this diagram, so we have a full set of RESTful APIs. So really anything that you can do through the web interface of Looker, you can also access through REST API. And we have plenty of folks who are, you know, either piping into another BI tool or maybe even are piping it into a full embedded experience. So let me see if I can pull this up. So we have a, you know, we have a full line of embedded analytics called Powered by Looker that are all running on this same platform. Let's see if this wants to load. Yeah, and that's actually a good subtopic to talk about here very quickly is embedded analytics, right? Because again, at the end of the day, end users have a certain amount of projects and a certain amount of tasks that they can accomplish in any given day. And a lot of people don't like bouncing around from system to system. You want to be able to work within the usual environment that you have. And so I think what I really like about that is you're able to facilitate that kind of thing. So can you talk a little bit about and I'll go ahead and throw up this. Yeah, I'm going to, I will just switch in apps. I'm going to share Chrome, which I know fashionably will load in as far as seems very unhappy. So let me load this up. So this is, this is all powered by Looker. So this is Looker too. But obviously it looks very different than Looker. And, you know, this is using a mix of the API and, and iframes to embed Looker in an entirely different experience. So this is, you know, here's an iframe, but I can also, you know, create a totally custom visualization using D3, a totally custom filter set, you know, and I can say, oh, I want to add the West Coast. I do that and the visualization updates. And so this is all accessing Looker, but doing it through the API where, you know, I've built some custom filters. It sends that off and sends that West region to Looker, get through an API call, gets the data back and visualizes it here in a custom visualization. So, you know, I can take men out of the equation. And so, you know, all of this is, is happening through that same platform. And I think that, that approach of like a single platform through which you can do really anything you need to, whether it's accessing it through our BI tool, through a different BI tool, through a whole different application is really core to the way that we come at this problem. When you also have kind of touched on one of the key components I think of future efficacy and data management and analytics at large is separation of concerns, right? By leaving the data where it is, by having a marshaling area in that modeling layer that you talked about by having version control, et cetera, you're enabling in users to slice and dice and play around and do all kinds of different things without breaking anything and without jeopardizing what someone else has or what someone else wants, you basically enable the agility and the latitude that business users want, right? Absolutely. I mean, we have customers and I've done projects actually where we will write scripts, Python scripts or Ruby scripts to generate LookML and then a whole Dev workflow where you have a Dev staging server, you prep stuff, you run an impact analysis to see which looks and dashboards will be updated, and that's all seamless. It's all very much a sort of modern development workflow that's enabled by the fact that we're code, that it's Git, all these things that developers have come to rely on and analysts I think can really benefit from. Yeah, I like that you mentioned GitHub too and the work that you guys have done, right? Because the last thing you want and you talked about this earlier is you don't want to reinvent the wheel. And so when you start thinking about how that applies to the world of analytics, especially in this wildly heterogeneous world that we have, being able to find the chunk of code that gets you 90% of the way there, that's a tremendous enabling component and it also, I think, probably keeps developers pretty happy, right? Because no one wants to reinvent the wheel, it takes a lot of time to be able to get that 90% done. That's the speed of business these days, that's the speed that analytics needs to be at if you're going to stay alive, right? Absolutely, and I think 90% is great, but I think where developers have historically gotten tripped up, understandably, is you say, I can get you 90% of the way there, but it's a black box and you'll never get 100% of the way there because you can't tweak. So we said, no, no, that's not okay. We appreciate the developers' concerns. We're going to get you 90% of the way there, but you're just going to have the code so you can go in and you can finish that last 10%. Yeah, that's a really good point. That's really good stuff. And so I'm just kind of curious to know, what do you find to be the most surprising when you go out there and start working with new companies? What shocks them the most or surprises them the most about what they're able to do? I mean, I love watching technical folks get it because they do, right, whether developers or analysts. I love people who love SQL or love building tools and seeing that there's a different way. I love seeing business users. I have to say, we think of our users in mainly three camps, the sort of analysts who speak SQL and they're happy because Looker is doing that for them for the most part and so they don't have to spend their days doing the same SQL queries over and over. Explorers, people who live in Excel and historically have waited in line for data polls from the analysts and consumers, people who want a dashboard or a report scheduled to them. And it's that middle group, those explorers, who really just are gaga about Looker because all of a sudden they're not stuck in Excel. They're in a place where they can ask a question, get the data back instantly, ask the follow-up question, get that data back, ask the follow-up question, and just, you know, and they're the ones who have that gut level understanding what the data means and, you know, I just see over and over how the discoveries they're making are driving their businesses forward. I love it. Well, Shannon, I'm going to bring you back in. We just burned through a whole hour with a great demo and some exploration of the new way of doing analytics. I have to say, very good stuff, Daniel. Thank you for your time. My pleasure. Thanks for having me. Yes, thank you, Daniel, so much for this great presentation. And Eric, thank you so much. So excited to be producing our very first EM Radio deep dive webinar. It's very exciting. And thanks to Looker for sponsoring today's webinar. And thanks all of our attendees for being so engaged in everything we do. We really appreciate your time and energy. So just a reminder, I will send a follow-up email by end of day Friday for this webinar with links to the slides, links to the recording of this session as well. I hope everyone has a great day. Thank you so much for joining us, and we hope you'll join in to listen to EM Radio at 3 p.m. on Thursdays. You can check it out at dmradio.biz. Thanks all.