 The topic is Perkura monitoring and monitoring. If I stand, is it still visible? Yes, so your view is from that corner. Okay. Okay, good. If you move around in this area. Yeah, probably I will sit for a while when I demo, but otherwise I'll stand. Can we start? Right. Welcome back everyone after lunch. I hope everybody had a good lunch. So now we have Punit from Perkura. He's come all the way from Indore from the city of I would say Nawabs and poetries. Probably food. Food, yeah. So take it away Punit, all yours. Thank you. I've been associated with Perkura for about five years now. I work with the PMM team and today we are going to look at how you can use Perkura monitoring and management to monitor the databases that you're running. And we're also going to look into some of the features that it offers out of the box. So I just go over what PMM is and then take a quick review on the architecture of PMM, jump into the monitoring side. And then we'll look at the query analytics feature in PMM. Talk about advisors, what are advisors in PMM. And then we'll have some demo on backup and restore in PMM. And finally, if time permits, we are going to look at how you can manage and how you can use database as a service in PMM. So let me start. So basically what is PMM? It's an open source database tool that primarily focuses on allowing you to monitor your databases. It gives you an ability to analyze the kind of workload that you're running on PMM or on your database, sorry. You get the capability to troubleshoot any performance issues. And on the top of that, we also provide you features on managing your database. Currently we have support for backup and restore on MySQL and MongoDB. And we are also planning to GA very soon our database as a service feature that's going to allow you to run Percona operators and deploy a DB close source on your Kubernetes infrastructure. So before going into the monitoring demo, I just wanted to quickly cover an architectural overview of what PMM is. Or how basically PMM architecture looks like. It's not an in-depth analysis, but probably I have referred the documentation where you can go in and check more details around how PMM actually works. So we've got a PMM server and a client where PMM server consists of some of the components that Percona build. For example, PMM ManageZ, that's basically the brain behind PMM. We use Victoria Matrix as our time series storage database. Basically all the metrics that we collect from the clients are stored in Victoria Matrix. It's completely compatible with Prometheus. So that's what we use as a time series database. We have query analytics that under the hood uses Clickhouse to store query metrics. And then we have Grafana, which is basically the UI layer in PMM that lets you look at the metrics and all the dashboard that we ship with PMM. On the client, we have PMM agent. That's the component that is built by Percona and it communicates with PMM ManageZ. So if you have multiple databases, you install PMM client on each database host and it would set up different exporters that we ship with PMM client. And then there is VM agent, which is basically a component responsible for pushing metrics from the client node to the PMM server instance. So PMM basically supports two mode in which you can gather metrics from the client instances. One is a push mode and the other is a pull mode. The difference is in pull mode the server is actually reaching out and trying to gather metrics from the client instances. Whereas in push mode you have clients pushing the metrics to PMM server. Let's talk about what kind of monitoring options we have in PMM and what all databases do we cover when it comes to monitoring in PMM. So as you can see PMM basically supports most of the major, majorly used databases. We have support for MySQL, PostgreSQL, MongoDB. And basically I would say monitoring is a bread and butter for PMM because we ship PMM with a lot of custom dashboards that primarily are built by our database experts. So these dashboards they come from internal experience dealing with customer issues and trying to capture how you can troubleshoot a customer performance issue. So I will quickly show you people on a live instance of PMM that I have where you can actually look at the number of dashboards that we ship for different databases. We also have support for monitoring cloud database. So if you are using Amazon RDS or Amazon Aurora instances, if you have GCP MySQL and PostgreSQL instance or if you have a Microsoft Azure MySQL and PostgreSQL instance you can monitor that using PMM. I would say one of the most important and a great feature in PMM is the external exporters. Basically external exporter is if you have a service that PMM doesn't support by default but if you have an exporter available in the community you can start monitoring that service using the external exporter feature in PMM. So let's quickly look at how the monitoring looks like. So this is the home dashboard of PMM. I should have changed the theme to white but yeah. What you're looking at is basically it's showing you how many databases it is monitoring at the moment. I'll just set up this instance for text and demo purpose probably a few hours ago. What you see here is there are two MySQL databases for MongoDB and two PostgreSQL databases that PMM is monitoring. You get an overview of all the nodes and the health CPU usage, memory usage on the home dashboard. And when you add an instance in PMM you basically get to see the instance detail. For example let me quickly show up MySQL instance summary. So I added a MySQL instance for monitoring in PMM and as you can see it is shipped with all the dashboards that we build internally and shipped with PMM and these are all custom dashboards. And you can see here you can filter by service that you want to basically monitor. So there are two databases that we are monitoring and you can see that the metrics are hitting PMM server. I'll quickly jump on to the setup part and try to talk about it. So how do you set up monitoring in PMM? It's pretty straightforward. You need to have PMM server running and you can install PMM client on the database host. Once you have PMM client installed on the database host you can config the PMM admin command to connect the client and server. Once the connection has been set up all you have to do is create a monitoring user according to the docs that we have and once you have set up your monitoring user you can use PMM admin ad commands to add specific databases for monitoring. There is a lot of help instructions available. So if you run PMM admin ad help commands you will basically see an entire list of supported command line options and all the list of databases that we support in PMM. Let's talk about query analytics. So query analytics is I would say one of the most important features in PMM which basically allows you to monitor the kind of workload that you are running on your database. What kind of queries are being executed and what's the query that basically needs some have some performance issues. In we support query analytics for all the three databases that you can see and there are different queries so that you can use to monitor your query performance. If you want to use slow log you can use slow log or performance schema in MySQL. In PostgreSQL we have PGstat PGstat statements and PGstat monitor. PGstat monitor is an extension that Percona has built. It's a custom extension that you have to install on your Postgre database. It basically seamlessly integrates with PMM. So it's an in-house tool it's an open source tool and it gives you far more better insights as compared to PGstat statements. Let's quickly look at query analytics as well. So on my test instance that you are seeing right now I added two MySQL instances and one of them was basically added using slow log. How do I get to know what kind of instances I have added? You can actually see it either on their individual dashboard or you can even go to the PMM inventory and you can try to look at the services that you have added and here you can clearly see that I have got two MySQL services running. One is MySQL check service and the other is MySQL client slow log. If I want to see what is the query analytics on a specific service I can go to query analytics using the menu in PMM. I am just going to filter by time range because I ran a specific sysbench workload a few hours ago. Here what you are seeing is all the queries across all the databases and all the data coming from them. What I wanted to show is the MySQL service that I added with slow log. So I can do that in multiple ways. I can either use the service name filter. If I already know the service or I can use even the technology if I want to filter by MySQL databases only. There is an SAP test queries that are being executed and all of these queries are the load that I ran on my database. When I click on a query I can clearly see that there are metrics related to the query execution time how much, how many times the query was executed. You get to see examples and explain of a query and you can actually gather a lot of metrics similarly for that PG database as well as MongoDB. So this is how query analytics looks like in PMM. There is also an option for you to add a different metric. So for example right now what you are seeing is PMM query analytics is showing you a metric for load query count and query time but if I want to use a different metric I can use that using the add column feature and there you get a bunch of metrics that you can basically use in query analytics. For example let's say I want to look at the byte send data and I can see that byte send information is available on the query analytics. I can make it my main metric and it will basically sort all the queries based on the byte send that I have for in QN. So we have looked at the monitoring part where basically we have dashboards showing you metrics. We have looked at the query analytics part where you have query details and query performance information. The next is advisors which is basically intelligence built in PMM where we have some preconfigured checks coming from our database experts. Whenever you add a database to monitoring in PMM it runs those checks against the databases and tries to suggest you or give you a suggestion in terms of what you can improve if there is a change needed in the configuration or if there is a change needed in the way that you have set up your database. These are all coming from our experts. So how can you use this feature in PMM is pretty simple. I am using a very basic instance without any custom connections. So I will just quickly show you. As you can see on the home dashboard there is zero advisors, failed advisors right now because I have not actually executed any advisors check. In PMM settings you get to change the interval of an advisor. So ideally they would run once in four hours. That is the frequent interval and standard interval and rare interval. For the demo purpose you change it to the bare minimum value once I enable advisors. So here what you can see is there are some configuration advisors that we shipped with PMM and then there are some security advisors. What are configuration and security advisors? These are basically for example if you have a database that is outdated you would get an advisor for that a check running on your database and trying to tell you that you probably need to upgrade the version. If you have security configuration that probably is not done right maybe like a root user without password or if you have a CV on the version that you are running it is basically going to show you those advisors. Depending on your so depending on your account we have Percona platform where basically you can connect your PMM instance to Percona platform and get access to more advisors custom advisors. And if you are a paid customer it basically gives you even more advisors running custom for your deployments. So let's see if I am able to get some advisors on my database instance that I have just enabled. Ideally it is not really needed to manually run it it automatically triggers but for the sake of time I am probably going to run it manually. As you see I just triggered the advisors and I can start seeing them on the home dashboard. When I click on it I can see that there is a check for Postgres CVE because I intentionally set up a version that had a CVE and I also installed an older version of Postgres because I wanted to see if there was going to be any warning on this or not. If you go back probably we will have more advisors showing up. You can see here that there are different checks that are basically being executed. Let's quickly jump on to the next part. So let's look at how Backup and Restore works in PMM. As I mentioned earlier we have support for MongoDB Backup and Restore and we also have support for MySQL it's a technical preview right now. For setting up Backup and Restore all you need is to have PMM client installed on the host page or running your database you probably need PXB and PBM depending on the kind of database that you are running. If I have to show this on the instance that I have I will just quickly show you folks how it looks like. Before I prepared this instance for the session I added a MongoDB replica set to my PMM instance for monitoring. You can see that I have a MongoDB replica set running right now and I have metrics coming for that instance. What I will do is I will go to backups and create a backup location. You can configure it using an sd location or you can do it via local storage option. For demo I will just do a local storage. So configure a test location. There is a way for you to configure schedule backup for your database They are not going to go into the schedule part but I will just quickly show you an on-demand backup that I am going to take right away for my MongoDB instance to choose the replica set one. We have support for physical and logical both. I can choose the location that I just added. In advanced settings you basically get to define if there was any failure do you want to retry and stuff like that. As you can see it is trying to take a backup and get access to the logs. If there is any failure you probably want to see why that backup failed you can get that access here. As it succeeded I can even restore from any specific backup. I currently do not have any restores on this but I will quickly show you how restores look like. So that is how restores and backup work in PMM. You can access the logs and try to troubleshoot if there is any failure on the backups. Finally going to the last part which is database as a service where you can actually use dbath in PMM where you can provide it Kubernetes config file and it is going to install Percona operators on your Kubernetes infrastructure and allow you to deploy your db clusters on it. I have got a sample minicube that I have and I will quickly show you how that looks like. So for running dbath in PMM you go to the dbath option and there you see register a new Kubernetes cluster. I have a minicube cube config file probably my minicube right that is why I do not like live demos. I had another minicube file that I was going to use and as soon as you register your Kubernetes cluster it will basically set up operators on it and then you will have an option to create db clusters. Let it register and install operators while I will quickly complete the presentation. So there are some reference material that you can use and probably reach out to us if you have any questions while you are trying out PMM. We have a discord group and we have a PMM channel there. You can actually visit us and ask any questions that you have while you are trying out PMM. I am ready to take any questions and answers if there are any. Of PGSTAT statement and the PGSTAT monitor you have mentioned about. Basically in PGSTAT statements we have CPU, memory consumption, histogram all those we have in PGSTAT monitor while you don't get in PGSTAT statements. Along with that it's efficient as compared to enabling PGSTAT statements on your Postgres instance. So that's basically the difference I can give. Maybe I would like to know where these backups are getting stored. Like is it on a server or we can integrate S3 or external storage. You can integrate an S3 storage and you can also have a local storage option. The one that I showed was basically getting stored locally on the client node. I don't know why I am not able to register the Kubernetes cluster but I will quickly run the demo that I have for device so that you folks get an idea of how this looks like. I always keep a backup of video. Yeah, any other questions meanwhile? Yeah, it's completely open source. You can get to see the code that we have for PMM Managed or PMM Agent. You can contribute back. It's completely open source. So as you can see this is how basically the Dbass looks like. Once it registers the Kubernetes cluster it's going to show you an option to create db clusters after provisioning it and from there you can actually set up different versions of the databases as you can see for MySQL what specific version of database you want to deploy for MongoDB what version of database you want to deploy. That's all the time we have today but I'm sure you are around for the rest of the day. Please forward all your questions to him. Thank you very much. Thank you. Welcome back everyone. Today we have from Grafana Labs Ganesh. He's a senior software engineer at Grafana Labs based in India and he is going to talk about life cycle of a sample in Prometheus TSDB. Over to you. Thanks for the introduction. In addition to what he gave us an introduction I'm also Prometheus team member and I maintain the Prometheus TSDB upstream. So that's why I'm speaking about it here. Before we go into the TSDB what is Prometheus. Prometheus is metric based monitoring and alerting stack. You can instrument your application or run exporters which will expose the metrics that you want to monitor and Prometheus collect those metrics for you store it inside let you alert based on condition. So it's a whole monitoring stack and talking about the time series itself. It has an identifier in Prometheus it's a set of label names and label values like a pair of strings you can say slice of pair of strings. The HTTP request total here is an example where the metric is tracking the total number of HTTP requests done for this particular job and that's that one is a special label name which has a custom label name and that's just a value. Along with an identifier time series is just a stream of samples. Here a sample is a tuple of timestamp and a value. We use Unix timestamp and the value can be a float and that's the basis of a time series which we are going to store in the TSDB right now. So Prometheus is huge it does a lot of things but we are only going to focus on the time series database which is inside Prometheus in the middle how it stores the data in its raw form and just makes it available for queries. So we are not going to worry about anything else that Prometheus does. So here there is a TSDB and the sample that we are going to talk is the timestamp is a 64 bit integer which in Prometheus we store the milliseconds as integers because we found that's kind of enough for most of the use cases and in case of value it's a float 64 value. So Prometheus right now supports more than float 64 custom data structure to store high resolution histograms but for the sake of this talk we are only going to talk about sample having a 64 bit integer and a 64 bit float value. So let's before diving into how it stores insider TSDB it's we can look at the overview. So there is a component called head block it's component which stores the index of the recent data in the memory and recent data in the memory and some things memory map we will see how it works in a moment. So this is the component that first gets the data in the TSDB and after the head has stored data for some time we create persistent blocks immutable blocks which you can show for the term and do queries on it. So we are going to go inside this itself and how they are created. So let's first look at writing a sample inside the head block. So here is the head block there is nothing inside right now and we get one sample and we create an index entry for this particular set of labels called a time series. Chunk is a compressed set of samples we call it a chunk in Prometheus we use something called Gorilla compression by Facebook and every time you get a sample you compress it in flight and store it there instead of storing the raw samples but before we put sample into the TSDB we have first created the series entry in the index and then we write into something called write head log. We will see why it's required right now. Once we write it into the right head log like log that this right has come to the TSDB we then write it inside the TSDB in a compressed fashion and inside the right head log if it's the first sample we store the labels of a series and series gets an ID it's ID 1 and we write another record which says for example is T1, V1 for ID1 it's the right head log it just logs every write that comes to the TSDB so we need right head log for durability just imagine you did a write request to the TSDB and the system kind of crashed right now or you said to the upper layer that this right was successful but if the system currently crashes all the data is in the memory how you would use it back so this right head log to replay the events that happen as is in the right and recreate the in memory data structure that we had before the crash or even during restart and we get another sample we write it to the right head log and then the chunk and inside the right head log now you see we don't write a series record because we have already written a series record and you pass through it from start to end so we just say for ID1 we got another sample that's about the first step of adding samples in the Prometheus so let's say you added more samples and it got full right now like you have to cut the chunk somewhere you can't just keep on growing the chunk what we do here is once a chunk is full in our case we take it as 120 samples but we are currently trying to cap it at a certain size as well we just memory map it into something called head chunks head chunks are just a bunch of very this compressed chunk and just the reference of that particular chunk like where it's on the file in the memory so you are not storing the compressed chunk in the memory you just you are just holding a 8 byte reference and whenever you want to fetch this series we just take the reference fetch it from the disk and then query on it this way it saves memory for during majority of the time when you don't need to query that data and the same process repeats you get more data more chunk gets filled now you have a lot of data in Prometheus how do you take care of it now we create something called persistent blocks out of the data that's present in the head block right now we need to do that to make queries and a lot of things efficient so this process is called head compaction where we take the data from the memory map chunks which are on the disk and also some data in the memory if it falls within some logic that Prometheus decides ok we want to compact data from this time range to this time range and just create a block and the same process repeats we create another block the earliest block I am numbering it n the new block is n plus 1 it's like a linear set of blocks that come into the tstv ok but why do we need compaction so time series are not always the same after some time you may be ingesting data into new set of time series so you don't want to hold dex record in your memory so once you compact and put flush the data onto the disk you use those index entries from the end space also you are holding the chunk in the memory that also gets cleaned up and restarts will also be faster because now you have to replay less set of events to create the memory ok so what about the bigger block if you notice in the diagram I have the n minus 2 block as something large block it's we created from the smaller blocks that we just created so imagine you have 4 blocks a b c and d so based on some logic we choose 3 blocks at a time and just merge them to create a bigger block we'll see why it's required soon like every block has its own index and we are going to look into the index soon yeah block compaction is for efficient queries it also reduces the disk space uses because the index is not repeated and we'll see soon how so before we look at the block at some point you want to get rid of the old data you don't want to store the data in your system because sometimes you don't need the old time series data so you can configure tstb to have retention based on the disk space usage and also how much time range your data is covering so this is an example of time based retention if we consider this as some kind of number line every data which is beyond the red line we have to delete it but if a block overlaps we can't pass immutable we have to delete a whole block together so once we add more data and create more blocks and the block goes outside the retention range it is just simply deleted like there is no waiting as soon as new blocks come up the old blocks goes outside the retention range the data block is just removed from the system ok so this is at a high level how tstb works let's dive a little deeper into into a block so inside the persistent block which is created out of a head block it contains an index which maps all the data that is present in the block it has a meta telling all the important information about the block which will let you decide how to and when to query it tombstones for deletion will come to it soon why we need tombstones and why can't we delete the data from block immediately and there is the chunks themselves so we store all the chunks together in a bunch of files all the compressed samples chunks are compressed unit of samples so let's look at one by one starting with the meta file the meta file like every block has a identifier which is called universal identifier and block stores data from a particular min time to particular max time so if you look at the meta you know what's the time range this block is covering few health information like how many series how many chunks and if the block was created from other blocks we also store the parent blocks for debugging purposes ok now let's look at the chunk files themselves this is pretty simple it has a list of files every file is capped at 512 mb and in every file we just store the store the compressed chunks as is because all of this reference is mapped inside the index and the reference is stored in a very simple manner so we have a 8 byte reference for every chunk the first 4 bytes stores the file number in which the chunk exists and the last 4 byte stores the byte offset in the particular file where the chunk exists so if you if I give you a 8 byte reference with the first 4 bytes you will which you want and you will take this last 4 bytes and just seek your file to that byte offset and there you have the chunk the chunk has helpful meta informations to know how much to read at time so this is a simple way how chunks are stored on the disk the most interesting part is the index Prometheus stores index in something called an inverted index format and see we will see how it stored and how it square it both so at a high level the index has 4 components first is the symbol table the symbol table stores all the symbols basically the strings that were seen in a time series so the symbols are like if you had label name htp equals total that is a symbol job equals to nginx then job is a symbol nginx is a symbol so all the symbols are stored together in a single table because if you repeat all the symbols in the index everywhere it just takes a lot of space so you just store it in a table in a sorted fashion and just use index the first symbol is number 1 the second symbol is number 2 and just use those numbers everywhere in the index so whenever you have to get the actual symbol you look up this table so this is the purpose of symbol table and now the series itself so it stores the labels that pertain to this particular series instead of storing the string itself it uses the symbol table index right now to store which labels are for this particular series and then a slice of chunk references we saw earlier that chunk reference is just a 8 byte number but with every chunk we store what is the min time of the chunk what is the maximum of the chunk and what is the encoding of the chunk so that we know how to decode the chunk and the series are stored in a sorted fashion based on their label values like first based on the first label name and value then the second label name and value and so on so if you take s1 and s3 we know that if we sort it again s3 is going to come after s2 and s1 so this is how series is stored these are just simple plain information that store in the index now we come to the interesting part so in inverted index word the postings are nothing but the series okay I skipped one part here so every series has a reference s1 s2 s3 that I mentioning here so those are again byte offset in the index so if I give you a series reference called s1 you just take a offset in the file with s1 and there you get the series in Prometheus we align it to 16 bytes so reference is actual byte offset divided by 16 and if I give you a reference you just multiply it by 16 and get the byte offset and you directly go to the series where it exists and postings are nothing but those series IDs so postings are series IDs and we store a list of posting list we will come to it in a second why we are storing a list but this particular version stores a list of those series IDs and the reference pr1 pr2 is again the byte offset in the file I will come back to this again after looking at the next stuff so there is posting offset table this is the important part so you saw every series has a set of label name and label value pair and when in this posting offset table we store like okay pr1 equals to bar 1 is present in a set of series which is represented by the posting list 1 and pr1 equals to bar 2 is present in this particular set of series so this is how we store the inverted index for every label value label name and label value pair we store what are the set of series that correspond to this so this just points to a posting list as a reference and based on pr1 pr2 pr3 we can look at this table to actual set of series that are for this label name and label value so now we will see how we use the index to query so let us say now I want to query in primitive fashion this should fetch series that has both the labels pr1 pr2 equals to bar 3 so we take one matcher at a time let us see we take the we look at pr1 equals to bar 1 and pr1 equals to bar 2 against the posting label now we know it matches a particular set of posting list now we have the reference where the posting list exists now we take the reference look at the posting table and we get the set of series references that actually match these label values and now that we have two sets of series like series references you just intersect them and finally now you know that this series mentioned by the reference s6 and s22 matches the query that you get this is a simple query that you can do on TSTP and now you have s6 and s22 you just take those references again and look at the series table and now you get the series and the series now tells you what are the chunk references that you have and then you can just fetch the chunks and do the query on that and depending on the time range that you have queried let's say you have queried from T1 to T2 you just trim the chunks when giving back to the API caller I will go back to this diagram again so in short we started at posting when querying we started at posting offset table with that information went to the postings from there we went to the series and we finally got the data so when you have to this is about querying a single block like the index is concentrated on a single block when you have to query multiple blocks you do individual queries to each block even the head block gives the same interface of giving the label name and label values and get the samples and series pertaining to it so there is a there is just an implementation called courier which queries each individual blocks and then merges the data together so this is where the big block comes to help if you have less blocks big blocks look up less number of blocks to get the same data and because the index series because sometimes series stay for a long period you again deduplicate these entries so that's the use of having a bigger block and it's not just the equal matcher that is to be used we can also do not equals 2 or you can match a reg X or you can say if it does not match a reg X give me these results so finally we come to the tombstones tombstones are there to record the deletions that you make on a block because the blocks are immutable if you have to delete a data and series you have to recalculate the entire index because all the offset and everything is calculated based on the byte offset so that is very inefficient so when you get a deletion request you see which series it affects and what's the time range the deletion is asked for and you just record it in a file called tombstones which just says for the series reference this time range is deleted and so on this is usually small so we don't really optimize stuff here and whenever you are querying when you are looking at the chunks you also cross verify with tombstones and only return the data that does not overlap with these tombstones so we have like 6 minutes so I'll quickly cover a couple more things we talk about head chunks that we maintain in the in memory database part we have to see how we replay it on startup and there is another artifact called a snapshot on shutdown which kind of helps in this replay we will see in a moment so when we are replaying the data you first replay all the head chunks basically the compressed chunks that you have on the disk and once you have that the right head log like in when you are replaying the right head log decompressing the right log takes roughly half of the time and then actually ingesting into the psdb is roughly half of the time the help of the memory map head chunks we can discard samples which already exist in the compressed chunks so that saves quite a bit of time but still replaying right head log is a little slow so there comes a snapshots on shutdown where everything that is that you have to replay on a startup when you are shutting down gracefully you just take a snapshot of that you take snapshot of all the series that exist instead of taking it from the right head log records and then take the snapshot of the in memory chunk which has not been flushed to the disk in which case you do not have to go to the right head log you can simply skip the right head log just replay the head chunks and snapshot this way you can speed up the replay and there are another component called right behind log which is used for out of order ingestion of samples the process that I showed right now every sample has to have time stamp greater than the previous sample for the particular time series for the compression to work well so there is for out of order ingestion we have another artifact but because of time crunch we are also not discussing that and this was a very brief introduction in 25 minutes you can only explain so much so I have a 7 part blog post which explains everything in detail so if you want a link to slides you can scan the first one if you want the link to the blog post you can scan the second one thank you do we have questions for him wow that was a very well received no but I was quite intrigued I have never seen actually the structure before head logs which is a very common idea I would like to add one more thing like this TSTB block is not just used by Prometheus there are projects like Thanos, Cortex and Mimmer which are distributed time series database built as an extension to Prometheus for long term storage they use this specific block format to store their data like they literally use the TSTB code its open source its go code literally you create the same kind of blocks and they have some kind of super power in the sense that the blocks are distributed in Prometheus its all on a single machine in Mimmer Cortex Thanos its distributed so you can when querying multiple blocks you can just run a query on a separate machine and that way you can speed up the query so this TSTB block format is used by multiple popular distributed database as well any questions going once going twice gone well thank you very much Ganesh it was a very very nice topic and thank you for sharing everything about the last leg come on let's push on we have a very good topic up ahead for us its about distributed PGSQL database we have Saranya Sri Ram from Microsoft Azure Cosmos DB team and she is actually from Singapore so I don't have to welcome her but take it away Saranya NoSQL database but today I am here to talk about one of the companies that we acquired a couple of years ago its a startup specifically called the Citus database if you have heard of it and we continue to keep it open source and most of our team is supporting the Citus core engine it is platform agnostic meaning you can run it in bare metal you can run it in multiple clouds just to kind of clarify the synergy of why Microsoft like Microsoft or my team acquired Citus is to also have the ability to provide distributed Postgres as a hyperscale service say as first party in Microsoft but for today's session we will be talking 100% about the technology piece of the thing which is essentially how interesting it is to distribute PG SQL using the Citus engine with that I wanted to kind of just start off with a quick overview for the how many of you have heard of Citus here essentially I think I have spoken or I have worked with the co-founders and the founders of the Citus engine it became for example part of the Microsoft family but also very passionate group of founders who want to and who are committed to keep this open source and it will continue to be so there were couple of caveats of how one what problems Citus was trying to solve and also why for example choosing Postgres as the base to start distributed like a distributed equal engine for extremely real-time analytics and fast query pushdowns and enabling at scale scenarios right so firstly wanted to kind of call out that the Citus is more like an extension over PG so essentially we will see that any time there is a new upgrade on PG the next day Citus will be supporting the latest version of PG so that is a promise that we provide on the release pages and this will be fully open source and provides you multiple self-managed environment setup for PG also just if you are curious this a logo that the Citus team came up with is to kind of say hey we will start with Postgres the elephant out there but we are having some magic into the elephant like a unicorn so it is like trying to bring sharding in terms of various kinds of tables on it also wanted to kind of call out where we are and these links and all of them will be provided we do have for example if I quickly try to take you to two links as we speak and those are your landing links for this conversation today's citusdata.com and the other one is where the releases are deployed and you will see the latest one is 11.2 and every few months you would say that it is a standard stream of releases and we can see that there and similarly the Citus data page on getting started the main value prop of distribution and scale parallelized performance using Postgres as the base storage engine in terms of the way it is used open source and fully managed database service that can be deployed individually and simplified architectures so just wanted to call out on that before we proceed. Setting context here when we look at modern database platform like to think of it as explosion expansion on three pivots one is the data structure itself we started off like decades ago as structured databases and from structured databases and the last 10 to 20 years NoSQL databases have come up big time and then the NoSQL databases providing geo distribution scale and chart but then query performances are still the best when you say in SQL data stores so how can I get more asset compliance in NoSQL base where we transition to what we call as NewSQL data stores but then if you look at SQL world of things they have also improved they are also supporting stores like JSON being and various other kinds of data structures but also enabling query pushdowns and charting and that is what they call as DSQL or distributed SQL so it is emerging of DSQL and NewSQL and that is where for example Citus engine is providing strong query language in terms of relational database of Postgres but also being able to provide that kind of chart right so that is the variety of data the other thing is volume of data data is very cheap to store and the compute of the data is what is more expensive but the data itself GBs to TBs to in fact petabytes of data is something that Citus we have owner to customers that so it is really the volume of data that is expanding and the velocity at which they are working as well so starting from batch operations to transactional workloads to real time analytic workloads all of these need data to be accessed in specific formats right so multiple questions that we asked when and by the way I am speaking on behalf of the Citus founding team the ones who have been contributing to the source code etc I joined late personally into the bandwagon so I would like to be more like a messaging fork to that but then the key detail on which is how can I solve problems of scaling out of data performing various kinds of queries on data could be transactional queries could be real time analytic queries could be you know batch queries essentially systems have become what we call as headstab systems hybrid transactional entity processing systems and how can you know today's workloads bet on that and the data structure on how you are trying to store the data while these are being said now on which Citus has been built in terms of core benefits of PostgreSQL so in terms of the you know the extensions has been something that probably is being the singular most important thing why PostgreSQL has been adopted so widely by the community as opposed to forking and setting up your new data store how can I build on and then build extensions on it is something that is being valuable data structure such as JSON B as I said has been pivotal in terms of being able to query and also being able to distribute so a geospatial data so all of those are the tenants rich indexing of course the query and query push down these have been the search index extensions have been the pivotal of PG while we look for example in terms of scale out terms of global reach security components HA that has been built in there is options to have high availability and replicas out running queries and running queries that can be more intelligent integrated etc so those are certain things right so the other thing is when we look at it is extension for PostgreSQL benefits we are looking at four main pieces one is scale out horizontally as we said simplified infrastructure where you can decide the number of nodes we want and then how the system sets up performance and then like literally one day in terms of catch up in terms of the existing PG version support as well what are probably the key differentiators when it comes to Citus so today on Citus you can start with a single node for example and in place without downtime you can scale out to multiple nodes so that is an interesting thing and when we say as a single node there are two three interesting constructs in fact just two simple constructs on Citus one is called the coordinator node but today at least everything is piped through a coordinator single point of failure if you have a single coordinator node but from the coordinator which has the metadata store you have multiple worker nodes where you can decide how your data needs to be sharded and then the worker nodes actually can't be in the data so if you have a single node then the coordinator and the worker and all the shards essentially sit on a single node and then you can have scaled out to multiple nodes what we call as shards splits and non-blocking shards splits meaning the incoming queries can be allowed any active workload from coming and the back end shards can be worker nodes can be split out so that is something that in terms of online scaling and the other thing is called online rebalancer these are issues that people see we see a lot for example in the NoSQL world that I am more familiar with for example where you know just the laws of the physics and how your data has to be stored partitioning in a geodistributed space is completely given so your data is sharded now once your data is sharded and you shard based on a particular shard key and the shard key could be a single property or a single value could be a compound key could be a nested key could be a hierarchical partition key where you have one shard key and then multiple other shards as sub partitioning so you have this option all of those are things that you probably know or it is there in other services in today out there but what is interesting is how in an active production workload can you avoid a hot partitions or can you avoid shards some of them which are hot while others are very cold and for example you do not have for example including Citrus and in other databases as well actively you do not have a hot and cold in terms of storage and for you to run a query saying run this as a hot query run this as a slow query which means the query can go to a more cheaper storage or the query has to go to an in memory storage like in terms of the query pattern that is not very common if you kind of think the other option is actively real time assuming that I have all my data that I need can I rebalance can I remove components from one shard includes data movement and move it to another shard it is this shard rebalance and distributes shards the old and new shards so the worker shards are balanced and the best part of it is it gives you granular control as you as a DBA or a developer can actually say in fact you can say I wanted to move from this particular shard to another shard one example of that is this tenant isolation so for example there are commands like you can say isolate a tenant to a new shard and you provide that table name and the tenant ID in this case a new tenant has come out in shard 4 and we want to say that here I want to separate it into an entire new worker node and then have this single tenant into that so the difference is and in fact if I coming from my NoSQL hat if I look at that there are systems where in this happens automatically while it happens automatically could be a extremely cool engineering problem for us say I talk from Microsoft for example for us to solve to say in a multi-tenanted system how we can do that we actually see a lot of customers saying that we want we know our workload the most and so we want to be able to say which is that large tenant that needs to be isolated so systems like site is actually provides the granular control and that is something that is very much valuable the other thing when I say key differentiator it is like the next way availability of the open source pg version that is available will be available on site is right there so next we talk about the high level architecture of how the site is extension works so essentially a pg client would hit the coordinator node the coordinator does not contain the actual data it contains mostly metadata and also there are something called local tables that you can say can reside only in the coordinator so it contains fair amount less amount of data but mostly the routing patterns to say which worker nodes have which shard and so it can you know route the request and aggregate the request part to create parallelization so that is the coordinator and then you have multiple worker nodes depending on again the user's choice to say I want to have two I can even do a four core machine and scale it to six or eight core in place or I could be able to you know fork it out and say I want X number of worker nodes so next when we talk about the coordinator nodes I said it contains node with metadata information typically used for routing so in this case in this sample you kind of see that there are multiple worker nodes four of them here and each of the worker nodes can have X number of shards right they could have various number of shards now the interesting question is how is the shard decided and in case of PG for example say a PG a database has some 50 tables now all of these 50 tables need not sit in all the shards for example so there is the way we define it for example I would want to just go through a small example here like how do we take an example for example and having sales transaction here so a local table when I say create a table local table is created which has sales transaction ID the customer ID date and amount just an example that we are just sharing and then when we say create a distributed table and then I am giving it the customer ID as the shard key and then I am saying this as a sales transaction table so create a shard on this customer ID so you will see that all customer ID 111 will be considered as one shard customer ID is 112 so all the sales transactions for one particular customer will be showed as a single shard now what is interesting is the three kinds of tables I will come back to this but I want to just talk about this so we understand the architecture there are three kinds of tables this table that we saw here this the sales transaction is what we call as a distributed table that means this table exists in every worker or the subset of worker it is sharded table that is a distributed table and for the distributed table you need to have a shard key on which it distributes typically if you make a query based on the shard key then your query will go only to those worker nodes which have that shard data for example there are multiple tables but all of those tables have that shard key as the primary key then your query is the most efficient because it is just going to go to those shards those shards in that particular a single worker node or maybe couple of worker nodes to get the data and all the multiple tables which are sitting on the shard key for the respective key ranges will also be co-located with each other so that is your distributed table reference table is a table that you feel that I do not have a exact partition key or a primary key on which I can actually map two or more tables so I want the table to be there in every single worker node so situations where I know that I need the table to be again like a copy of the whole table to be stored I store it as a reference table and for it and the local table is typically the ones that stored in the co-ordinator node for a specific kind of you know joints that you want to do you have some sort of a you know you have a product catalog and you say I have these five items that are always going to be there or you know I want to merge it for a particular campaign and the campaign will go I can store it as a local table merge it and once the campaign is I can drop that local table so that is the high level there and the other thing is how our queries routed so that is what I was trying to just say it at a high level when you write any query on the Citus engine essentially so query is typically what we say is we would say we want the query we want the query based on the Shark key the query will go to the co-ordinator of the node and then it will be routed for example in this case will be routed to that particular shard which has that data now if that query has joints or query has multiple tables and there are two scenarios that can happen one scenario is the shard key is the same shard key in those multiple tables then actually the way the system is sharded is such that only those respective co-located data ranges is actually showed which means you will have lesser number of queries to be made on the worker node two is the option to use reference table or local table so that is one three otherwise the query will actually run to specific worker nodes that have and that is kind of the more worst case scenario or the situation where performance wise it is probably not the couple of other use cases in terms of what we wanted to cover in this session one was just highlighted four main use cases real-time data analytics mainly because we say real-time analytics means that you want a better kind of query query which includes some smaller aggregations query includes some real-time data and stronger query pushdowns so it helps to have postgres engine for that and we see site is better there enabling htap but the same engine can be used both for OLTP and OLAP workloads IoT based elementary systems because of the real-time nature and lot of so situations which have lot of concurrent needs tend to be better off in the site is because they parallelize workloads better and then definitely huge huge customer base and adoption and seeing in the multi-connected SAS workloads here so when we look at real-time or time series operational analytics in terms of reporting here is so the diagram which is there which is put in a lot of the customers that I have seen which includes a lot of Microsoft based workloads such as you will see data breaks hosted on azure data break this could be any kind of web app service could be any push notification service like APNS, GPNS in our case NPNS which is Microsoft push notification server systems so on and so forth but essentially as you see there is the there are two three things that is interesting the site is engine enables what we call spread out, shard out etc and hence we use the word hyperscale but in the in the true sense when I say hyperscale you probably want some cloud vendor to maintain that hyperscale for you but there are people who actually take the open source site is to be able to manage their hyperscale clusters also so that is very much possible freshness of data availability durability concurrency all of those are obviously the most important choosing the shard key is probably the most important decision in just as we do that in any no sequel components in site as well so that is important then scaling multi teneted SaaS applications as we spoke about and there are some interesting pitfalls IOT as well just go over that purely IOT workloads and considerations high transactional and typical work load considerations so those are the big things that we want to talk and just to close off today's session in the interest of time is some of our large customers for example open source site is have about petabytes of data sitting through situations where suddenly when we have huge number of workloads like concurrent users 60,000 to 100,000 users something that site is scales there is a site is con which is coming up this week actually April 18th and 19th I wanted to close today's session by saying that these are this is purely based on our open source site is engine and our contributors who are founding that pretty interesting sessions available there things around efficiencies of chart rebalances other extensions on PG high availability being baked and so on and so on so is that thank you and if do we have questions for Sharnia Sharnia so the data model itself needs to be changed from an existing Postgres database to be adopted into hyperscale actually it's a very valid question so I am sorry do you see any intelligent solutions being built at site is which helps in bridging this gap schema migration is not not something that is needed 100 percent when we say moving from relational to no sequel there is an entire data modeling exercise that's needed where is existing PG to PG site is there are two kinds of activities needed one is based on what is the main query patterns and whether I have a right chart key on those query patterns and two is how is my data or expanding or how what is my elasticity needs so based on those two there seems to be some tweaks but essentially if you have for example 20 tables on Postgres you have the same you have so the overall all extensions will be equally supported similar foreign key primary keys can be construed so I would say that is that's there but we don't have a ready made tool if that's your answer we have to work on building something the chart rebalancing is that part of CITUS or is that part of the cloud service it's part of CITUS the latest 11.2 provides chart rebalancing I have two questions the first one is just now it says it has parallel processing so does it mean that if I run a query across multiple charts so it will do a MPP on all the charts yes it will parallelize your so if you do a query on multiple charts it will spin off separate threads and it will run the threads on each of the charts it will reply about the coordinator will aggregate back in Respawn the upcoming version on CITUS is going to parallelize the coordinator that is not yet available but that's coming up sorry we are really out of time we need to have our next speaker thank you so much but I'm going to be there here we are back on track we are about 3 minutes behind but we should cover up we have Chinni from Grab to share it's a very long title probably one of the longest I've seen but she would be talking about data modeling so let's get started Chinni, what are you? I'm Chinni I'm Chinni and today we're going to talk about how do you model data fast we're going back in time using dimensional modeling in a modern data stack so as you can see it's about being changed aware when you're doing a data modeling about me I currently serve as a senior data engineer at Grab so I am a technical speaker and an occasional writer on data engineering topics so you may have seen me around on some of the blog posts or in a pie data talk I am 90% self taught because as you can see where the 10% actually came from it comes from my engineering degree so my engineering degree actually comes in very useful here so before we begin some disclaimers here for everything in this talk is based on my own technical expertise there's nothing to do with my team or based on high focus on structured template data so I understand that for data versioning so it's similar concept in this scenario in your data team so imagine that finance department request a regen because it's your own regen so we need to regenerate the report upgraded from tier 2 customer to tier 1 customer 3 months ago so if you put this on tier 2 end upgrade because client A was tiered in this scenario when you are doing you can see that client A was a tier 2 customer at the end of the year it's a really tier 1 customer so it actually has to be up in this instance this brings me to the concept of time in dimensional data model so time is an important dimension in your data because the data is usually not static because you can have state transitions during a business process and your attributes can also trade over time such as your age your income, your status in previous case change so don't assume that your data is static and this brings us to the importance of data versioning so what is data versioning the data versioning is the concept of capturing state changes while keeping track of successive versions of the data over time so what I mean is that for each data recording we create you need to create a unique reference for those collection of data we call it that while we retain the previous versions so there are two general data versioning approaches so one is more for structured data which is what is known as change in our capture there is also an extension of this change in our capture approach which is applied to unstructured data which is the concept of data version control for structured data so for those who are not very familiar with data versioning so this is what your change data capture in your data version control roughly looks like so change data capture is what we are quite familiar with in the traditional approach of having our timestamps and having your valid problem too and for data version control it's more on something like version control for data so you have examples like like the iceberg or you have your DVC and so on for those actually like extension of change in our capture so why does data versioning matter so as we can see one of the main reasons why it matters so much is because of reproducibility for data governance and audit purposes because let's say at the end of the year we need to regenerate those reports and we do need to be able to capture all those changes of your dimensions over time because we do need to be able to report to whatever authority it is like make your data governance team or make your audit team and you need to have audit trail of the changes in your data then two you need to build so is to build data history with backward and forward compatibility so this is more for the data team because what if there is a change in transformation logic that only applies to us but at the time which let's say let's say they say that marketing says we have a promotion for a specific time period so we will need to change our competition logic for our rewards so in this case is only for that specific time range and you need to be able to capture this type of change in logic and last but not least this is a little bit similar to the first point but it's more of the fact that when you need to use point in time values to track business metrics over time so this is more in the case with if as a business user they need to be able to have a good profiling of how let's say the customer profile actually changes over time so they do need to have they do need to have a good audit of that to be able to monitor their business metrics and now we've talked about now what is change data capture the change data capture is simply data versioning for databases those are design patterns for capturing and tracking changes in your data from your upstream source systems over time and in this case changes are actually captured in either your data warehouses or data lakes the differences are not very clear anymore because your data warehouses are going to have your data warehouses they are their warehouses so some design patterns for change data capture are data versioning based on the combination of type of the versioned identifiers timestamps and status indicators which means whether the record is valid or not another data pattern that's very common is your log to your super versioning which is also known as type 2 slowly changing dimensions which I will elaborate on later and last but not least is transaction logs transaction logs are specific to the system so let's not talk about that now I've talked about the real machine method what is change data capture so now let's go on to the context of this talk what do we mean by the modern data stack so it sounds like a fancy word but simply put modern data stack means that it is cloud based built around the cloud data warehouses and it's modular and customizable so instead of the traditional approach by which was a vendor and it's a front-size mid-off we chose the best tool for the best job so if we look at the components of a modern data stack you have your injection step which is done by the later loader you have your date which loads data from upstream source systems to your data warehouse and then we go through the transformation which goes through let's say we have the staging layer we know there's one silver goal so we're still also what we know as we call the medallion architecture and after we process it into the goal layer this is subsequently used by Dao Shun users which could be for your machine learning or it could be for the artificialization and those are orchestrated by F like let's say like a workflow orchestration tool such as Airflow so what about data warehousing in the modern data stack because we are using cloud-based compute and storage this means that your compute and storage are now more scalable compared with your traditional data warehouses instead of the traditional ETL approach we're now moving more towards an ELT approach whereby the transformation is done within the data warehouse and last but not least the most important implication of this more scalable compute and storage is that it is now possible for us to stop snapshots of data in a cloud data warehouse to capture all the historical changes so this is something that may not be able to be done in the traditional data warehouse whereby compute and storage is a blocker we talk about what's the difference between the traditional data warehousing approach we go to the point about the change data capture in the modern data stack so now we have this modern data stack so what are the implementations that we have for the change data capture so we have the traditional Kimbaos dimensional modeling techniques which is mainly on the concept of slowly changing dimensions and we now also have out modern approaches such as using data snapshots and also experimental models those are a little bit more modern and a bit more functional so let's go on to this group, let's go on to all these techniques so the traditional Kimbaos dimensional data modeling for historical context is developed by Kimbaos in 1996 and it was quite updated with the latest update made in 2013 during the images of cloud data warehousing that's where you have your big query your redshift and this is an important because Kimbaos introduced the concept of facts versus dimensions and during the time when you have limited compute and limited storage this data modeling approach is designed for storage and compute efficiency so let's go into the fundamental concepts of architects versus dimension so what fact tables? fact tables they contain metrics and facts about business process so it could be something about your process time or the transaction amount during the business process and one of the defining characteristics of a fact table is that they are typically fast evolving during a business process event but eventually will reach a final state at a point in time upon completion so you know when it will end, when this process will end so there is a definite end point dimension tables however they describe the attributes of a business process so it could be a customer details and they are typically still changing and updated over a longer period of time because you don't so okay maybe you know that your customer is going to be like age 30 like this year at age 31 next year but you don't know whether your customer is going to upgrade to tier 2 to tier 1 or is going to get married or get divorced so you can't really estimate when the dimension will change and they typically do not have a final state for say compared with a fact table and this brings us to the problem of slowly changing dimensions how do we capture changes that are slow that they are slow and unpredictable so okay so now we now we talk about slowly changing dimensions let's talk about the types of slowly changing dimensions so what is slowly changing dimensions so slowly changing dimensions are change tracking techniques to handle different dimensions so you have type 0, type 1, type 2, type 3, type 4 type 6, very many types but let's just go through the important types type 0 it's very simple, it's fixed dimension so something like account opening date once you create it, you're not going to change it it ignores any changes, if you try to change it it's not going to change because it assures that this attribute will not change forever type 1 on the other hand reflects the latest version of dimension attributes and what happens is that when you have a new record and then you realize that this new record is actually related to something that's existing so the previous version of that value is overwritten with a new value so great, we have updated the changes but in this case you have destroyed history because what if it is an error in the journalist update, it's not possible for you to rewind back to the previous version of the data so that's lost which is the focus of this talk implements role versioning for each dimension attribute so for each record you have a concept, for each version of data you have a concept of validity period so you have a role effective day you have a role special day and sometimes you may even have a hard role indicator in this case when the change is detected in the data record instead of immediately overwriting the record you create a new dimension role with the updated attribute values for the data record and with the end for that particular new dimension role you create a new primary surrogate key and then the previous version of the attribute is updated with the role expression time step so instead of overwriting destroying history, you are updating history so what does not I mentioned about chivalry versioning what is chivalry versioning chivalry versioning, it's a traditional capture mechanism that record changes to a mutable uptrip table over time and it implements your type 2 SEDs on a mutable table sources and typically it will detect changes on some updated at timestamp but maybe the naming might change but the graph idea is there so some components is that you need to know where are you going to say your chivalry versioning you need to know where to track you need to know what's the unique identifier and you need to know whether to invalidate records no longer in source but sometimes you may delete the record so this is an illustration of how chivalry versioning works so we have some timestamp that we are tracking and we are using that to determine the validity period of the record so this is the initial state and then when you see that there is an update then the customer cheered from cheered to the cheered one at a particular point in time instead of overwriting it you capture the changes in this case so now you can see that we have updated the validity period for the previous record so we have the type 3, type 4 and so on and so forth but we are not going to go in depth into that because this less commonly used and it's a bit more complex so let's not so you can read it on your own now let's go into the more modern topics so data snapshots the data snapshots are read only immutable copies of the state of the data source at a particular point in time and usually you will store all those data snapshots at the staging area of the data warehouse but you ingest from the source and then you throw it at the staging area for further processing you can think of it as you are taking timestamp images of the data sources taking photos of it at a certain period of time and instead of directly directly creating your scd2 you are creating your data snapshots so that you have those data snapshots and then you can proceed to create your scd2s or whatever downstream data models that you would like to create because in case you mess up your scds at least you still have those data snapshots to fall back on to complete this discussion we have incremental models because we need to capture changes in your upstream data into your downstream models because of incremental models that you limit the data transformation to a specified subset of source data so typically you will want to be able to capture those data that have been newly created or updated since the last get it right because you don't want to do extra work right and in this case not only do you want to do extra work you also significantly optimize your runtime on the transformation of a large data so that's why you may want to use the incremental approach instead of a full load load-load approach something like this it takes a bit of work but you need a different minute what is the incremental load that you would like to load because it's typically what has changed you previously ran your last scheduled job you need to get the delta and how it actually looks like in let's say your dbt models you will typically define where do you insert the load how do you insert the load where do you get the load and no before you think about where do you get the load and all those things you need to know like what do you do what is the condition for this incremental load otherwise you're going to load the whole thing so those are the thinking approaches that you need to think of when you are defining an incremental model so if we are looking at an SCD2 it's pretty straightforward because if you're doing just that incremental load you just need to select the most current records and just load it in so you can see it's just if you see that you can just look at what is the most current record and your job is done so some things to think about when you are designing an incremental model does it already exist because if it doesn't exist then you have to do the full load do you want to just is there something wrong with your transformation and you want to refresh the whole thing or things are fine so let's do an incremental run so you need to include those cases and then what's the incremental strategy so it depends on whether it's a data warehouse or the data lake and then some columns to track changes those are the typical so is the canvas approach still relevant especially in the modern day aspect so this turns out it still applies because even though your storage and computer are dirt cheap you know this doesn't really apply for very large dimensions because your storage and computer are not going to be very cheap and this does not preclude the importance of dimensional data modeling if you kind of find out more you can read the message between these blog posts and also watch his talks about that yeah and in some way it doesn't mean that we need to let everything about him off because yes, in some ways you do need and typically you will really need to do a dimensional data modeling in certain cases so one case is when you need to aggregate facts then yes, you do need to put in some work into how to model your data secondly, it's about metrics drill down based on dimensions you also require some thought in how do you want to model your data and last but not least you typically can still use in financial reporting and audit because this is where it comes in very useful you need to have an audit of your containers in data over time so that is a requirement so in this case you can't say I don't want to do Kimball, you have to do Kimball in this case so to round up this talk I will share some tips and tricks on ensuring that your data modeling is change aware first tip is to snapshot all your upstream source data because you are not in the model data stack your compute and storage are scalable and what if assumptions about your data change so let's say your drag and team tells you that it's open only but it's an override so what do you do if you are going to use an SED too I'm not sure how you're going to rewind your changes so still save your upstream source data and step for them and secondly, you know like you have all those things like my columns are changed those are draw or add some columns so you still need to be able to have a photo of your state of your upstream source and last but not least is about your business logic because they can tell you that retroactively when you don't apply some business logic retroactively so you still need to have those copies of your snapshot to be able to modify your logic and as I mentioned previously store your data snapshot in your staging area or your SEDs from there instead of having your SEDs directly reading from your upstream source so in this way, if you mess up your SEDs you still have your data snapshot to rely on and of course speaking about schema changes it's useful to detect your schema to your upstream source and your data warehouse so something like this something like a source target schema reconciliation will be useful to detect changes and use type 2 SEDs and incremental models because type 2 SEDs are typically sufficient for tracking and capturing data changes you don't need to go into SED 3 or 4 unless it's special cases and you may think that why do I need to think about incremental models it's so complex but when your data is going to 10 times 20 times 100 times having a thinking of it in an incremental way it will pay off in terms of efficiency and cost so unless you are like a startup and you want to deliver fast you can go ahead with iterating fast but if you want to think of your data data models in a more long-term manner then it pays off to actually develop in an incremental way to run out of talk incremental models will be to design with your upstream data update mechanism in mind and also think about your incremental strategy because it depends on whether you are using a data warehouse or a data lake and a performance tip is that you want to filter those early before you do any competition because if you do the competition first and then filter it then it's going to be pretty expensive so some key takeaways would be a double picture of SED like approaches and incremental models and also data steps so that's about it so thank you and you can get those slides within this QR code thank you thank you very much I'm extremely sorry we wouldn't have enough time for questions the next speaker has to get ready as well thank you everyone we'll wait for the next speaker thank you so much we'll be presenting on lip camera making complex cameras easy first of all I want to thank for SESHA organizers for organizing the conference and in person it has been a really weird time after a pandemic I am Umang I was basically a desktop engineer recently I've turned into an embedded developer I have been contributing to open source since a long time now I have been contributing across Genome OS 3 Flatpak and other immutable OSes recently I am doing mostly embedded upstream Linux media and lip camera these days so we will first take an overview around what complex cameras actually mean what are the challenges around in this field complications what we have been able to achieve and finally any questions I would be happy to answer them just a disclaimer there are logos and trademarks in this talk and they belong to their respective owners so we have no intention to take over their work so let us begin with complex cameras this was around 10 years ago the cameras were simple you really had a sensor and on the SSE you had either a CSI2 receiver or a scalar which will scale your images and just give you an output capture node and the application could really work with them they were simple to configure and to develop but around I will not name the model of this phone if anyone can guess it this is where the complex cameras got complex now from the simple from here we are here the complex interfacing and the basic pipeline got very complex in the red boxes you can see how the raw sensors are present on the SOC front you might have CCP2 CSI2 something like AE like auto exposure white balance and a preview node for previewing the images and other nodes for the still capture because preview nodes are generally low resolution and for the still capture you need the highest resolution so they were separate nodes and for the API configuration you had multiple nodes like v2.7, v2.4 and v2.3 respectively to configure and get the images out so from the left hand side where we had a simple camera metagraph we went to something like this OMAP3 that's correct now in the recent development you might have also come across the Raspberry Pi autofocus camera modules as well which if you can see in the middle the silver plate is basically a motor a very small motor called a voice coil motor which will move the lens to adjust the focus of the image and something like the autofocus block will be somewhere here in the green so it's yet another node coming up you can see how the complexity of the camera and the pipelence is getting increased day by day so what are the challenges here from the application point of view things were really things can applications can manage to certain extent but the problem is that the problem is that it doesn't scale you have an application and you have an underlying hardware the application is very much tied to the hardware it cannot be ported to any other hardware for example application written for Raspberry Pi cannot be to Rockchip or cannot be ported to applications like more complex camera hardware like Intel IPO3 so application very tied to the hardware application developers would need to know the underlying hardware how to configure it there are multiple nodes as you can see and each node has to be configured in a certain way at a certain time and a certain certain position in the pipeline you cannot configure for example resize the node before the CSI2 node receiver and all of that so this makes the application development quite hard so lip camera as a solution or rather try to be a solution but let's see how it is filling the gap so before we go ahead I would like to show you an overall architecture of the lip camera which might seem a bit complex at work but really is the kernel where the drivers live and the underlying hardware the kernel exposes a V4 L2 API and a meter controller API on the top you can see the adaptation which is basically the application facing APIs that the lip camera provides the core is a lip camera helper and the pipeline is what where things get interesting on the top right you can see the pipeline handler framework which is very specific to what the underlying hardware is so lip camera will have pipeline handlers for Raspberry Pi different pipeline handlers for Rockchip different pipeline handlers for IPU3 or IMAX 810 plus whatever the underlying hardware is it will abstract all that inside a pipeline handler on the in the middle you can see in the red box you have the IPA module that is image processing algorithms the image processing algorithms are the ISP blocks that are provided by the vendor they are usually emitted on a system on chip that is the ISOC lip camera provides a way in combination with the pipeline handler to configure the IPA or the ISP that is that is present in the system on chip what happens basically is if you can see in the previous diagrams the red nose are basically the raw sensors these are the sensors which are which if you cannot see and raw image directly it needs to go through the processing blocks so the processing blocks and the algorithms that run on the platform are encapsulated in the IP itself the IPA can be proprietary we have customers where they do not want to open their IP and all the the basic the business the business is driven by the IPA module being kept proprietary so lip camera automatically has a sandboxed environment where it can plumb the proprietary IP into itself and can run with the open source pipeline handler to get the images out and when you have a camera when you have written an application lip camera can do the device animation it can discover what kind of platform it is the application is running on if it is a Raspberry Pi it will specify handler Raspberry Pi IPA algorithms everything if the same application is run on Rockchip it can detect that it is a Rockchip platform and accordingly the pipeline handler algorithm and the entire pipeline is configured in that way so to summarize lip camera how do we define it open source camera stack and framework for Linux and write and Chrome OS what are the complications so basically v4 l2 is everywhere and everybody seems to love it it exists first of all in the upstream Linux kernel there is no other for the media capture v4 l2 is the by default API that kernel provides if in future kernel comes up with a different video capture API lip camera would be happy to support it and we are working in that direction as well the same API is v4 l2 is used for simple cameras digital TV setup boxes etc so when you have something like there are areas of conflicts such as color spaces where some color spaces doesn't make sense for for a camera but doesn't make sense for TV setup box streams so there are conflicts like that but the v4 v4 l2 API is what we have and we have to live with it for now but it is widely tested v4 l2 is used already it has great interoperability and these are the applications and the media frameworks that you can see on the screen but not everybody loves v4 l2 because the problem is everything is configured through a node and when the camera pipeline is very complex you have to the application has to take care of various various types of node and has to deal with reconfiguration and configured in a certain way which makes them quite opaque for portability reasons sub devices are as we as I said like there are there is ISP on the system on chip there are multiple devices that needs to be configured so the application developer might have questions like which sub device I should configure first or is this the right way to do it or what is the format that needs to be that needs to be configured for the entire like media bus multiple nodes for a single camera device the application developer only needs to care about that this is a camera this is how I configure it so in the absence of the camera he or she has to deal with multiple video nodes for example metadata video node csa2 receiver in the ISP it itself has multiple nodes m2m developer v4 l2 alone isn't enough because when the ISP comes in the picture the sensors are really raw sensors they are not RGB or YUV sensors and YUV sensors are becoming obsolete laptops are now using complex cameras one example is Intel's ip3 and ip6 these are really complex cameras and we need support for like the lib camera has to pitch in to have a consistent api interface to access those cameras on Linux embedded devices are already using complex cameras the example previously I gave you around raspberry pi's autofocus module it's a complex camera but OEM needs custom solution to manage these cameras and they I'm not sure you Ubuntu had shipped a very proprietary camera stack for Intel ip3 where we discuss with them like the camera can be a solution no portable mobile camera application as I said the application is very much tied to the underlying hardware when a complex camera is involved so portability is just out of the question but there is a new api and lib camera aims to solve it the problem is lib camera needs to have very good kernel support for like the sensor driver should be upstream the ISP driver should be upstream and then only the lib camera comes in and solves the problem we live in a age where upstreaming itself is a difficult task there are many BSPs and BSP kernels and drivers are floating around C application don't want to use C++ lib camera is written in C++ but we are working on language widening as well is it finished not yet but we do have releases we are not guaranteeing ABA stability yet so after all these two after the challenges and the complication that lib camera tries to address these are some features and developments lib camera has a G streamer element that means you can use the G streamer element to encode stream and mix all the other G streamer elements up and try to get a pipeline working as you wish the example is for the simple camera viewer and for the receiver as well we have Python support so this was first language binding that we landed officially in the lib camera repo which is not a part of lib camera it's a raspberry pi's work to it's a lib camera based replacement for pi camera which was a python interface for raspberry pi legacy camera stack many raspberry pi users and hobbits are very familiar with the pi camera we do have an android help implementation the picture you see is running the Chrome OS camera app and the stream is going to lib camera HAL layer we do have v4 and l2 compatibility layer similar thing ld preload we will hijack the calls from if you have a lib v4 l2 if you have an application based on lib v4 l2 you can just swap in swap out lib camera as well as test applications we do develop like for our own use and to test the entire library we do have qcam and cam these are very helpful and we encourage user to just try getting started with these to get a introduction to lib camera we do have a simple hello world for the api which is like a very simple camera manager starts I want to get these cameras run, queue up request and stop and you can see what's capturing and how it is done we do have pipeline I'm not sure if everybody is aware of pipeline it's a upcoming Linux multimedia stack last year on a pipeline hackfest we tried running chromium which would grow through the xtg camera portal and it was routed to pipeline then to lib camera then to up up up back to the kernel so we already are working and it's already merged I think the pipeline integration is already there which has multiple like pipeline has a much broader exposure to stacks like video conferencing solutions browsers this is the pipeline stack where the app the browser streamer vlc are talking to the pipeline the pipeline for imaging can offload it starts to lib camera and lib camera will deal with the cameras side of things and for the audio you have bluetooth also audio integration I think so the pipeline really sits in the between for all your video and audio handling I've linked up block for the pipeline to getting an introduction to the pipeline snapshot is upcoming convergent camera app where the goal of the app is to have a unified camera app that can run on both on Linux based mobiles and distributions it is incubated in you know so very happy to see that we do have flagpack support across flagpack support means like you have permission base like if an application tries to so the xtg portal web art is in our support xtg portal and lib camera will have xtg portal support that means web art can give you frames that are coming through xtg portal then lib camera and everything can work in the near future when the xtg portal has been merged you can really access cameras from your browser these are two xtg portal and the correct link for the firefox and chromium once this is done the browser will be working with lib camera for the video capture support another notable development is ping ping it developed by rafael 2k it's a fork of sale os harbour camera which is a first camera application developed for our original pine phone and the application is really nice because it has manual controls like you can change anything with the exposure time brightness in the UI itself this is a demo image testing on the python using lib camera waydroid is a container based approach to like having a network system on your regular gino linux system and we tried and had integrated lib camera in waydroid so you have like a lineage osmh running on your gino linux system where it accesses the camera through lib camera and one I think this is the last one pine phone pro v test ground for mobile linux capture for everything it's a rock chip rk it's a complex camera with a raw sensor and isp and it's already supported by lib camera I do have a demo for this let's see so this is a demo for pine phone pro capturing images with lib camera you can see the initial support is there but like the algorithms, ip algorithms are still needs to get better I think that's enough and last but not least this is a very recent development that I put in the last minute plug and play raspberry pi usb camera so here is a raspberry pi I think the autofocus camera that I first showed you it is plugged into a raspberry pi manner and through uvc gadget that our team have developed it can now get plugged into your laptop and can be used as a webcam so you can see the entire the complex camera supported by raspberry pi it's using the raspberry pi pipeline handler and ip algorithms and then can be used just a regular usb camera into your laptops so that's pretty much it thank you for your attention and if you have any question I would be happy to answer them so not yet but we are working with devices which have multiple cameras for example I think the chromebooks have multiple cameras back camera and the front camera the lip camera can handle multiple cameras but it depends on how the isp is configured some isp only supports like streaming one camera at a time we do have platforms where multiple streaming and use of camera is okay we are working on something that is called logical camera grouping so that is like mutual exclusion of cameras and things like that so it's still working but it's much more hardware dependent what the hardware capabilities are stereo vision I'm not sure I haven't seen two cameras okay I think that's a layer above lip camera that you have to build you have to get the frames and you never stitch which support pixel binning you do have pixel binning in the sensor itself when you get the raw image so that's a detail you put in the pipeline handler itself or like the binning how do you do the binning image plus with the distance you mean the I'm not sure I'm not getting it the distance the depth component yeah it's a control control like many cameras have different types of controls what the camera sensor can support what the ISP can support I think the depth would be the one of the controls if the platform in the sensor can support it can be like configured by the application itself so you can put in if the platform itself doesn't support it won't expose that it's supporting a depth since they do in the Chrome OS they have their own camera stack which is I think it's called cross camera HAL Chrome OS camera HAL so they are we do have Chrome OS working with lip camera already so that's pretty much it thank you once again good afternoon for the invitation the slot to speak so today I'm going to give a very brief talk about this project of mine which I did during the Covid time basically some time on hand right I could explore whether it's possible to use feed gestures to use that as input device and also feedback device to communicate so that this would be instead of using hands right and also looking at the screen so the problem is that for example like I'm not able to use the hand either by choice or it's like because of occupied right the hands and then maybe it's not so safe to use the hand and possible it's inconvenient to speak or it's noisy environment not able to speak and perhaps people who are active right instead of sitting at the desk and typing on the keyboard the whole day perhaps there's an alternate method to stay in connection and at the same time be able to be productive okay so I think as I mentioned the screen I saw and basically an alternative to the keyboard and screen okay so the active communicator basically allows the user to be able to type an enter message works as all the must have the so the idea is to be able to replicate all the keys on the keyboard right so it has at least 101 possible combination of input so so again what are the possible users who can benefit I put it into three color code perhaps the first one will be to help reasons disability and also the people who wants to stay healthy the second maybe professionals for example like surgeons working on operating theater the hands very busy right operating on the patient and the eyes have to look at the patient right so the eyes and hands are already tied up so how to for example monitor the heart rate of the patient right maybe the beep is too noisy right too many instruments so another input channel and then the last category will be the gamers and maybe the virtual environment VR user right so gamers extra competitive edge using this device so this is the preliminary diagram and there's a prototype that I made so it consists of the pair of shoes and also the risk health risk one device right so with this combination this is the some of the pattern filing program which was submitted okay so this is the results of the one single shoe right not the pre-pass so you can see that you can recognize 76 different gestures with a single shoe so with two shoes and a risk there's multiple possible combination run into hundreds right so this basically the gesture whether predicted match the actual gestures by the user so for example if I tap my feet that is recognized as a gesture so if I bring my feet to a certain angle then that is another gesture so again this one I think I don't need to repeat it's the same thing so again the use cases as I mentioned people with disability couple tunnel sufferers surgeon in operating theater PC mobile gamers so in terms of market you can see that I did the research and the biggest segment at the highest growth rate would be the fitness followed by assistive tech and there's also some potential here in terms of the most using this in the line of the work and also gamers and tech adopters so what are the alternatives out there right so from what I did in my research I see that the devices out there is pretty basic maybe the most most advanced would be this shoe lap down shoe if I remember correctly so basically this provides a positioning feedback for users so for example the people who are visually impaired can navigate around without any auditory cue so for example if they need to turn right after a few steps the right shoe would provide a vibration right to prompt the user to turn right or even maybe an obstacle in front can give a prompt by vibration so that the user know that it should stop and this is really not an alternative and then on top here that is a pretty crude right up to five keys but is functional definitely and how about the hand-worn devices okay so I think this is a little bit dated but we have seen this device out there sometime back right there's a laser pointer keyboard but this still requires the hand and tap on the surface right although there's no physical keys and over here this device is pretty interesting like this tap strap the tap strap this one on hand basically is tapping on one of the fingers right on the hard surface right depending on how which finger is being tapped or the combination of fingers being tapped it could kind of like replicate the keyboard but again this also requires the use of the hand right so hands and a surface as well so I did this comparison so I think the major advantage of not using the hands is that you free up the hands like there's given and then there's also free up the screen right with the feedback you don't need to look at the screen you can focus on other more important areas right you can even like as I mentioned like gamers their eyes are already busy so many things right in front of them okay so what about vision based AI gesture there's other other alternatives out there right so this is some of the things that is done and available for example using a camera to recognize hand gestures even open source libraries are already available so pretty impressive so the problem already or the limitations would be requires a clear and obstructed view right and then the lighting condition also has to be within condition ideal and of course I think the image processing workload is a bit heavier and there may be privacy concerns right because you walk the camera around people will be a bit suspicious and how about the proven voice basic voice communication just use voice call right speak like a phone of course this one also requires a background that is a bit not so noisy and I may not work that well with if the target is not another human being right you need to translate that into a machine code or for example activate machinery so also has to be discreet right so for example if you are a negotiator business negotiator or presenter so you may not be able to speak anything else right you have to just use some gestures to show for example time to cut the camera okay again so in terms of market I can see that this is from the public data basically so we can see the variables market is growing very fast right for the next 10 years or so right 12.4% CAGR and you can see that even the mechanical keyboards are pretty popular with gamers right so it's followed by the market so it's in the related segment it's all growing IOT so that is the market segment that was being pursued so I think what are the things that we can do to I was thinking right could drive this adoption is maybe gamify it right in a way right people sharing gestures when training gestures that is trained because otherwise the standard gesture may not work for every individual right some people depending on the culture we may not reflect exactly the same so I think with the two years since the last update I think now it's even easier to do this because of the AI processing capability and then we can even have the additional like Wi-Fi 6 and Bluetooth which basically improve the reliability right and performance and power consumption so that's all for my presentation get in touch and let's see what can be discussed okay thank you for this one okay I had the link in the detail original document so it's definitely from one of the research market research firm right okay I think I put in some kind of picture probably less movement on the feet maybe standing of stretching right so at least a variety of usage so instead of just sitting down the entire four-hour stretch we could do a bit of stretching while there's always demand for certain gadgets thank you sorry about the link there's some technical issues to do so so I want you guys to go on the stage which is about to introduce us to live coding music with Sonic Pi which is really fun and interesting so everyone please give a round of applause to Paul here thank you let me just open this okay let's start everyone I'm Paul I'm a developer during the pandemic and then during my free time so this is my hobby I don't get paid for producing music but I'm doing here today just a shout out to my friends who are here today I'm very happy my friends from DataKind, VDX from the community my friends from the workplaces that I used to work to so what is Sonic Pi? for me it's just a platform so that I can do some coding this code magically turns into sound so I can trigger my bass drum, my drums I can trigger my synths, my guitars and so on and so forth while playing sort of or performing live but I think I can demo this more by doing it coding live coding like how I use how people use Sonic Pi so before that our set list for today is very simple, we'll just make some notes and chords, the basic ones make some drum beats to support our music, use some external synths if the sounds or instruments inside Sonic Pi is not good for you you want to use your own synths I'll try to show that and then ultimately make that as a backing track and then I can just play my guitar so let me open the Sonic Pi so I'll just clear this off I'll just show oh yeah Yoke is my friend I get help to hold the mic so the basic concept of making music in Sonic Pi we just hit play play is our command to play the music so I'll just put some number there so if I play this can you hear it? the note as C, I think middle C on the piano if I'm not mistaken but at first I thought this is like the frequency but when I checked it's actually a MIDI note so a MIDI number that corresponds to a note so if musicians don't really want to type MIDI notes when composing music you can always put your MIDI note or your musical note as the letters so you have C, 4, so C 4th octave and it will sound the same and then of course let me get my sheet that I prepared for this you can see that both notes are exactly the same so like oh so one thing to note on Sonic Pi if you're doing copy paste like Ctrl C, Ctrl V it will be out C out V in Sonic Pi so that's the same and then of course since this is a programming language if you're familiar with Ruby, Ruby is not my first language but it has the basic syntax and then looping structures so if I want to play a sequence of notes you can just do something like this for each note it will just play and then the sleep code that you find there that's that indicates how long the note is played for this one this means that it will be played for one beat or one count so one count for each note so this is like playing C in octaves C3, C4 and I think C5 which you can also replace if you don't want the numbers you want the actual letter notes you can do that as well so it's the same and then if you want to have I think you can search in the web on what note is equivalent to what or you can just see here so you have all the octaves and then the corresponding notes on which octave you want to play but I don't really use the numbers usually I just use the C5, C4 to make it like musical in that terms and what about if I want to play two or more notes together to form a chord so what I have here is a C chord C chord is a triad of like C, E and the G note that forms a triad of C so I'll just play that so this one what I'm doing here is I'm going to play each note individually C, E, G and then I'll play them as a chord so if I don't put sleep in between they will be played as one that's like you arrive at your elevator level so let me just get to the other approach so if you don't want to type all the lines since this is a programming language you can just look through each of the notes so I'll just copy that and then place it in Sonic Pi so here it will play each of the note with interval of half a beat so let me run that and then if I want to play them together I'll just remove the sleep and then it will play all three simultaneously all three notes and then if if people just in Sonic Pi has a way to just for you to indicate just the chord and then the corresponding type of chord that you want to play so let me just do that so here I'll just play here so I'm going to play a C major chord the one that you just hear or heard that one but if I want to play a minor chord here you have a minor actually it has a intelligence or auto-complete so you can choose what chord you want to play so here I'll just change to a minor then I'll run and then if I want to be fancy and then do some major stuff it's a very faint but and then that's for how we do with the notes but you want to have a way to loop the notes so that you can have a sort of pattern that you can play over and over again so let me introduce live looping so here I'm just going to play a chord for every for every beat so let me just show that for every beat you can hear the note of the chord and then let me just stop that and then usually for chord patterns you have like a few chords so you can have a transition let me just show a bit of that as well so here I have C sharp, minor and then E major, D major and then A major it's a way to increase the volume this is the note that it gets it's okay so any EDM or a bit chic fan so I got this from one of these tracks previously I played it early, 2010s so this is the chord pattern and then here we can have a shortcut way also so like we did some shortcuts on the previous demo we can save a lot of typings by just looping a certain progression so this progression I can just store in a variable and then I can just run it play it in a loop so what we have here is a way of the chord pattern so the ring means the first chord so the tick here is just to invoke the next chord in that sort of pattern so if I play this it will be sort of the same but the structure is different but the structure we have created is a little bit simpler so that we can manage later so let's do this so the next one that we want to show is so we have the chords we want to put in some melodies so let me share that a bit so here let me just do that so when we do the melodies it's like what you saw a while ago I place here some notes and then do some duration on how long that note is or like two notes that I want to play I just put that two times play that note and so on and so forth so what I did here was I encapsulate that part or that block of code in a function so that I don't keep writing it again and again and then call that later and then there's a melody for the ending part as well which is repeated twice so what I did was for that live loop function to play that part play the distinctive for the first measure or first bar and then do the repeating part and then the distinct part for the second measure and it sounds like this and then we have this one is like it does nothing but since we are going to have multiple drums and parts like hi-hats, bass drum and stuff I just put there as a metronome so it will sync to that live loop and then so that when we add the instruments later the sound will be in sync so this one is our chug chug where's the chug chug ah there it is so it's like 128 beats per minute and it's driving for each beat so what we're going to add now is we want that so we're going to add some snares actually in Abiti does the claps but I don't have a clap sound here so instead of clap I'll just do what you call this I'll just do snare instead of a clap should be there and then although this won't be the EDM style because I'm going to add some hi-hats but just to demo how it works so this one does it for every half a bit usually you want to know what is the start of the measure so every 8 counts I'm going to hit the crash cymbal tap that and then so that we can do more sometimes I get carried when doing this so let's do all the melody and then the drums so what we're doing here is we added the drums the hi-hat is just the kick and then the snare and then the melody and then the chord progression drum and snare the song there's this section where that's something so this is what we want to do is add some sort of an extra note so for that particular one count at the last beat so that you can hear the slide and then the other one is if we want to use an external sound like here I have another sound here called Abisbo but it doesn't sound oh no okay I'll skip that let me just let me check if I can open it again otherwise I'll skip it so instruments keyboard okay it sounds now so this is a vital I'll show you later on how is that where to get that but I'm going to use that as my external synth and then here I'm just going to call an outside synth let me just run it so by frost if for Star Wars sorry no sorry Avengers fans this is I'm using Luke midi to connect my sonic pie to bespoke synth and I call my connector by frost so that's it that's the spot this is the one that I'm using with that you have to assign it here and then this is the one that vital is an open source synth and then you have to gain access there and then after after you you have to combine everything with the drums and the synth and all the drums the drums are just below the drums the snare and the the synth the first thing to do is this is the let's add that try and listen up and then let's get here and then look And the new music was like going to the school and that's why it's like when I hear the sound from the solid pipe, I'm thinking about how to use it. It's like one of the open source of music. In music, I like to do some music in the pandemic. So, I placed it on SoundCloud. I use music to play some guitar clubs, some guitar. It's been amazing. Paul, do you have any questions for Paul here? Thank you. Just a question. The sound library, let's say I got a very big sound sampling library I got. Yes. And can I actually import it into one of these places in a different sound and make a difference? Yes, but I didn't get a chance to do any with it. But there's an option to load external. So, the sample here, where did I use some sample? The sample drums, you can just add the path to that wave file. So, you have a sample, right? Just add a wave file and then it will play. Oh, I see. So, some sample space and then your wave file. And can you also make the special effects of the sound or effects for that? Yes. So, what you call this, adding effects. So, let me just do this thing. You just need to get a chance to do it. But you can, whatever is inside, you just encapsulate it with underscore effects. And then you can put reverb. There's an echo also, distortion. I see. And then just encapsulate it and then whatever is inside. Although for the, I think at first I was planning to put in my guitar as input. I mean, you can do it with live audio. That's a live audio thing. It's just that there's a challenge with latency. So, what I did was everything, every sound coming from Sonic Pi, just one stream. And then if I need to play along with it, I put it separately so that this will be sort of backing track. I see. Yeah. Great. Thank you. Nice. Yes. I think I haven't tried that much. If you push it at least to bespoke, bespoke has ability to, so this is stereo. You have left and right channel, right? If you put in your MIDI output in, let's say the UPD, then you can have that story effect. I think this should be stereo, but I haven't dealt with that deeply. There should be some stereo. Of course, no, no. Or I think it's default based stereo, I think, but can explore. Yeah. Anything else? Yeah. Thanks. Thanks, everyone. Thank you very much. And I hope you guys enjoy your day so far. And we're about to have a closing ceremony in the lecture theater at 6 o'clock. So, yeah, if you don't want to submit your way up there. And at 7 o'clock, if you guys, there's another clock at Hacker's Space, there will be a small gathering. So if anyone interested, thank you very much.