 So, Meltano is a lifecycle tool for data science, and what it allows you to do is to get data out of a source and go through the whole data lifecycle and get in the end some dashboards that show you all that data. So the way we say it is that Meltano is data source to dashboard. And so what we do is that Meltano actually stands for model extract, load, transform, analyze, notebook, and orchestrate, which is a lot of different things, but the most important thing is that let's say for example that you have some data in Salesforce, and Salesforce has a ton of data, and you can analyze that data with different tools that Salesforce has, but not the way that data scientists really want to use it. And so we built this tool around data scientists and exactly what they want to do. We have data scientists on our team, and we've been using them to make sure that this is exactly the way that they want it to be done. And so the first step that you would do is you would extract data out of a data source, and then you would load that data into what's called a data warehouse, which is just a database, and then you would transform that data so that it's useful to the user. You would model it, and then you would analyze it in a tool like this, which is Multano Analyze, which will show you lots of charts and graphs, and then in the end you can show some really nice dashboards for a CEO that would maybe sit on his big screen TV in his office, or maybe it would be for the sales team to view. So currently you would use something like Looker, and things like Looker, or Tableau, or anything like that. These are really expensive tools, and Multano is open source, and with all those different tools to do the extract load transform, well just to do the extract and load you might have a tool like Singer, and you would use Stitch, and that costs money, and then for the transform you might use DBT, and then for the this you would use that, and all these different things have a lot of different tools that are associated with them, and what we're trying to do, and none of them are open source, and none of them are version controlable, so what we're trying to do is create one tool that does the complete data source to dashboard, and make it open source and version controlable. So bringing those concepts of software development to the data science world, where a lot of data scientists, you know I've gotten calls with them, they're still using Excel spreadsheets and emailing them around, they could use a tool like GitLab to do all their version control, and use the Web IDE to edit their different modeling files, and then eventually get into like things like machine learning and AI directly in Multano, and have it all version controlled, and have one data team work, because you have data analysts, you have scientists, you have engineers, and have them all work together on one project. So I'll just give you a quick demo here. If you can see my screen here, if anybody's ever used something like Node, or Rails, or any like framework that you use for software development, that's kind of what we modeled this around. So let's say you're in your projects directory, you can create a Multano project by calling Multano in it, let's call it sales data, just for fun here, and what that will do is that'll create all the necessary files that will be version controlled through GitLab. We're going to make a really nice integration through GitLab, but you're able to use any thing that you want. So in that case you would use GitLab and everything would be really nice, but obviously GitLab can work for anything that's version controlled, so it will already work for GitLab. And when you run all these different things, they're in their individual sandboxes. So right now it's creating a sandbox for the transform layer called dbt and that's an open source tool just done for transformations. And it's running a little extra slow because I'm sharing my screen on Zoom as well. And what this does is this creates this Multano.yaml file, which I'll show you in a second, which manages all your dependencies because you're going to have a bunch of different extractors. Maybe you'll have an extractor for greenhouse because you want to track your hiring process and maybe you'll have an extractor for Salesforce. So maybe the GitLab data project will have a lot of different extractors and maybe they'll load all to Snowflake and then sometimes to Redshift so they'll have different loaders. And so now we have a sales data project. We can go into it and we can open it in an editor and you can see it created in a nice little package here. And if we were to say Git in it and we say Git status, you'll notice that a lot of the stuff is not included in Git because it doesn't need to be. So for example, this dot Multano folder, which has dbt installed in its own virtual environment is not included. And that works by when someone gets this project, they can run Multano install, it'll install all the dependencies needed for them to run this, but you don't have to pass those dependencies along. And then your Multano dot YAML file, which has all the dependencies which we currently don't have, but we will in a second. So we can say Multano dot discover, actually let me say Multano dot discover all so that it discovers all of the extractors and loaders that are available. So we have tap GitLab, which is an extractor. Tap is the way that Singer, which is a specification for extracting and loading. They call them taps and targets instead of extractors and loaders. So now we can call Multano add extractor tap GitLab. And hopefully this works because I was just, oh extract, we need extractor. I was just developing some stuff with Ben. So hopefully I didn't break anything before Joyce sent me a message. But what that's doing here, you can see that it's in the Multano directory. You can see it's created an extractor directory and it's installing tap GitLab, which has all of its crazy dependencies in its own sandboxed environment so that it will always be able to run tap GitLab without any issues whatsoever. Now this makes the project a little bigger, but it makes it so that you can have some crazy tap that somebody else wrote mixed with another crazy tap that somebody else wrote and they won't conflict because it's always hard to mix two different taps that two different people wrote. So now that worked. And if you look in our Multano dot YAML, now we have a dependency. We have a dependency on an extractor called tap GitLab and it needs a GitLab API token. So what we'll do is we'll go into our dot ENV directory and this is where we would update our GitLab API token, which it automatically inserted for us. It knows we need that and it has a default database that ships with it. So we will also need a Postgres password username address and all that sort of stuff. And so you would insert your GitLab API token there. And we also need a loader. We're going to load through Postgres. So we'll say, Multano discover all to see what sort of loaders are available if we go back to that. And we can see that we have a couple of loaders. We have CSV, Snowflake and Postgres. We're going to use the Postgres one. So we'll add that. We'll say, Multano add loader target Postgres. And that will add that specific project that's available. And all of these loaders and these taps and targets are sitting in their own individual Git repo on GitLab. So anytime someone wants to create a tap, all they do is they create a Git repo and that makes it installable through Multano. They just update a discovery file and that makes it available to Multano. So that was added. And if we go back to our Multano.yaml, you can see it updated the file with a loader called target Postgres. And it's saying that it's going to need these environmental variables that need to be available. And so in our .emv directory, we've updated all of this. And then what we can do is we can actually extract, load and transform that data by running Multano ELT, ELT stands for extract, load and transform. And what this will do is it'll do the entire extracting and loading based on the extractor tap first and the loader target Postgres. And based on the environmental variables, because I give it this GitLab API token, it's going to download just the GitLab information that's specific to me. So we've now in like, you know, five minutes because of my talking would be a couple, two minutes, we've now been able to get the entire process going. Extracting, loading and transforming. Where previously you would write all these crazy scripts to make this happen. I'm not sure why it's asking me for a password. So this is, yeah, right before I was working right on this, but luckily I have already extracted this data and you can see it right here. So I'm logged into Postgres right now and if I say, let's look at all the GitLab runners because that, let's see, there we go. You can see that that GitLab tap extracted all the data from GitLab from just specific to me, which was a small amount of data, I told it to give a small amount of data. And you can see it brought in GitLab runners, which is the CI runners, the things that run all the CI jobs. So we can say select everything from GitLab runners and you'll see that there are actual GitLab runners in there. Now previously this would take weeks because you'd have to write an entire script, it would be really complicated, but because we have different taps and targets that are like plug and play that are based on a specification that already exists, we can put all this stuff together and our data team is already using this. Now the really cool thing is that once they do all that, once they extract all that data, they can then start up a web server. If you type meltano www, they start up a web server and that actually allows you to now analyze that data, which is, this is a huge step already because now we're into a second tool that you would be paying a bunch of money for. So I already have this running so I'm going to show it to you. So what happens is I've already added two modeling files that are based off of a specification that is that a lot of data analysts know, which is, but we're going to be changing that because it's a proprietary specification and we want everything to be open source. But it's, it looks a little bit like JSON data. So we want to model this after that runners, those, those GitLab CI runners. So we're telling it that it's going to look for some connection called runners DB and it's going to explore through a view called runners. Here's that view. It's going to look for active description and that's going to be called GitLab runner IP address is shared name, all that stuff. And then to measure stuff, it's going to have a count and because we're writing very simple things here, like if you type count, it knows the complicated SQL that needs to be generated in order to do a count on any database on my SQL, on Postgres, on Redshift, on any one of those. So you're just saying count and it's going to generate the SQL for that. And I'll show you how that works. But first of all, our models is going to need a connection to a runners DB. So if we go to our connections, you can see that we've set up a connection called runners DB at a specific database. So you can connect this to many, many different databases and it will go out and fetch the data and it'll be able to analyze it a bunch of different databases. So in this case, we only have one DB, which is running locally for me. So now you can see here that we now have a model runner and a view runner. So we're going to explore that data and you see if we open this up, we now get all those columns that were available in the database and they're only available because through these files that we... Hold on, let me go back here. Through these files here that we created, these LookML files, which are now .ma files. So these ma files, these model analysis files, we've told it what we want to analyze. If we were to remove active dimension from here, then it would remove the active dimension from here. So in the case of, let's say, you know, user data, you don't want to analyze passwords because that's a bad idea. So we wouldn't include that in the analysis. That probably wouldn't be loaded in the extraction anyway, but if it were, there's many layers of changing the data up. So now what we can do is we can say, of all the CI Docker containers that are currently running, we're going to click, we want to find the active ones. You see as soon as we click active, it creates a bunch of SQL to properly analyze this data. We want to count them. And so you can see that it now runs coalesce count based on the ID and it does all this crazy stuff to know the right SQL to run. Because, you know, you and I may know basic SQL, but the real SQL that you need to run might be much more complicated. When you click run, because it's grouping it by active, we're only getting two results back and you can see that it's actually counted. This count is not something that's in the database. This is what's called a measure and aggregate. So there are four inactive runners and 16 active runners. And if we click the chart, then you can see that in chart form. You can see that there are 16 active and we can see a bunch of different charts. And that's not very helpful data in a chart form, but let's do something that's a little more informational. So what if we want to know how they're grouped by IP address and we want to count them by that? So we look back at the SQL and we can see that it's selecting all the runners IP addresses and it's going to count by ID and it's going to group it by the IP address so that they're unique. And when we run that, now we get 15 results. And this is actually querying whatever data warehouse we told it to query. So it's going out beyond Meltano and querying a real live database. And so you can see that we now have four runners that have blank IP addresses so that might be something that needs to be fixed. We can sort this data if we click it. And when you click sort, it's actually updating the SQL that needs to run. So you can see now it's got an order by statement in there that says count, order by count descending. We could also order by IP address, although that's not very helpful, but you can see that it does also update the SQL that needs to run. But we'll order by count. And then you can see as soon as we do that, now we've got the count of all those IP addresses. Now this data is fairly simple. And if we reverse this, then you can see that the data gets... And every time that you do this, it's running the actual SQL against the database. It's not doing it on the front end, it's actually running it against the warehouse itself, the data warehouse with live data. So then you can also see a line chart and you can sort this data that way. And the data that we're actually analyzing in the data team is much, much more complicated where it's joining multiple tables, you've got sales data and all of that stuff. But this is just a very, very simple example of the data that you might analyze. And so that's what Maltano does and this is the analysis part of it. So what we're working on next is to actually orchestrate this so that you can schedule your extract loads and transforms to run. Right now we've got them scheduled through GitLab CI, but we're going to run them through what are called DAGs, directed acyclic graphs, which will allow us to analyze the data to create a bunch of jobs that get run in parallel and fail, if they're going to fail, they fail appropriately in relation to the other jobs and they retry appropriately in relation to the other jobs. So we're using an open source tool which is made by Airbnb called Airflow for that. And that's the next step for this. So what I just demoed would be about four different tools currently and right now it's one tool. And once we get the notebook and orchestrate, that's another two tools. That's like six tools that we're kind of doing all in one fell swoop. So that's Maltano as it is today.