 I lead the product on the Microsoft side called Azure Cosmos Database, it's a NoSQL database. But today I am here to talk about one of the companies that we acquired a couple of years ago. It's a startup specifically called the Citus database if you have heard of it and we continue to keep it open source and most of our team is supporting the Citus core engine. It is platform agnostic meaning you can run it in bare metal, you can run it in multiple clouds. Just to kind of clarify the synergy of why Microsoft, like Microsoft or my team acquired Citus is to also have the ability to provide the distributed Postgres as a hyperscale service say as first party in Microsoft. But for today's session we will be talking 100% about the technology piece of the thing which is essentially how interesting it is to distribute PG SQL using the Citus engine. With that I wanted to kind of just start off with a quick overview for the, how many of you have heard of Citus here, okay not bad quite a bit, thank you so much. Essentially I think I have spoken or I have worked with the co-founders and the founders of the Citus engine when it became for example part of the Microsoft family but also very passionate group of founders who want to and who are committed to keep this open source and it will continue to be so. There were couple of caveats of how one, what problems Citus was trying to solve and also why for example choosing Postgres as the base to start a distributed, sharded like a distributed SQL engine for extremely real time analytics and fast query pushdowns and enabling at scale scenarios, right. So firstly wanted to kind of call out that the Citus is more like an extension over PG so essentially we will see that any time there is a new upgrade on PG the next day Citus will be supporting the latest version of PG so that is the promise that we provide on the release pages and this will be fully open source and provides you multiple self managed environment setup for PG. Also just if you are curious this logo that the Citus team came up with is to kind of say here will start with Postgres the elephant out there but will bring some magic into the elephant like a unicorn so it is like trying to bring a sharding in terms of various kinds of tables on it. Also wanted to kind of call out where we are and these links and all of them will be provided. We do have for example if I quickly try to take you to two links as we speak and those are your landing links for this conversation today's Citusdata.com and the other one is where the releases are deployed and we will see the latest one is 11.2 and every few months you would say that it is a standard stream of releases and we can see that there and similarly the Citusdata page on getting started the main value prop of distribution and scale, parallelized performance using Postgres as the base storage engine in terms of the way it is used open source and fully managed database service that can be deployed individually and simplified architectures so just wanted to call out on that before we proceed. Setting context here when we look at modern database platform like to think of it as explosion expansion on three pivots one is the data structure itself we started off like decades ago as structured databases and from structured databases and the last 10 to 20 years no SQL databases have come up big time and then the no SQL databases providing geo distribution scale and shard but then query performances are still the best when you say in relational data stores. So how can I get more asset compliance in no SQL base where we transition to what we call as new SQL data stores but then if you look at SQL world of things they have also improved they are also supporting stores like json b and various other kinds of data structures but also enabling query pushdowns and sharding and that is what they call as d SQL or distributed SQL so it is emerging of d SQL and new SQL world and that is where for example Citus engine is providing strong query language in terms of relational database of Postgres but also being able to provide that kind of shard right. So that is the variety of data the other thing is volume of data data is very cheap to store and the compute of the data is what is more expensive but the data itself GBs to TBs to in fact petabytes of data is something that Citus we have wanted to customers and Citus who are using that so it is really the volume of data that is expanding and the velocity at which they are working as well. So starting from batch operations to transactional workloads to real time analytic workloads there is no all of these need data to be accessed in specific formats right. So multiple questions that we asked when and on by the way I am speaking on behalf of the Citus founding team the ones who have been contributing to the source code etc. I joined late personally into the bandwagon so I would like to be more like a messaging folk to that but then the key detail on which is how can I solve problems of scaling out of data performing various kinds of queries on data could be transactional queries could be real time analytic queries could be you know batch queries essentially systems have become what we call as headstab systems hybrid transactional antiprocessing systems and how can you know today's workloads bet on that and the data structure on how you are trying to store the data while these are being said now on which Citus has been built in terms of core benefits of Postgres SQL. So in terms of the you know the extensions has been something that probably is being the singular most important thing by Postgres has been adopted so widely by the community as opposed to forking and setting up your new data store how can I build on and then build extensions on it is something that is being valuable data structure such as JSON B as I said has been pivotal in terms of being able to query and also being able to distribute so a geospatial data so all of those are the tenants rich indexing of course the query and query push down these have been the search index extensions have been the pivotal of PG while we look for example in terms of scale out terms of global reach security components HA that has been built in there is options to have high availability and replicas out running queries and running queries that can be more intelligent integrated etc so those are certain things right. So the other thing is when we look at Citus extension for Postgres code benefits we are looking at four main pieces one is scale out horizontally as we said simplified infrastructure where you can decide the number of nodes we want and then how the system sets up performance and then like literally one day in terms of catch up in terms of the existing PG version support as well. What are probably the key differentiators when it comes to Citus? So today on Citus you can start with a single node for example and in place without downtime you can scale out to multiple nodes so that is an interesting thing and when we say as a single node there are two three interesting constructs in fact just two simple constructs on Citus one is called the coordinator node but today at least everything is piped through a coordinator single point of failure if you have a single coordinator node so to speak but from the coordinator which has the metadata store you have multiple worker nodes where you can decide how your data needs to be sharded and then the worker nodes actually contain the data. So if you have a single node then the coordinator and the worker and all the shards essentially sit on a single node and then you can have scaled out to multiple nodes what we call as shards splits and non-blocking shards splits meaning the incoming queries can be allowed to without blocking any active workload from coming and if the back end shards can be the worker nodes can be split out as well so that is something that in terms of online scaling and the other thing is called online rebalancer these are issues that people see we see a lot for example in the NoSQL world that I am more familiar with for example where you know just the laws of the physics and how your data has to be stored partitioning in a geo distributed space is completely given so your data is sharded now once your data is sharded and you shard based on a particular shard key and the shard key could be a single property or a single value could be a compound key could be a nested key could be a hierarchical partition key where you have one shard key and then multiple other shards as sub partitioning so you have this option all of those are things that you probably familiar or it is there in other services in today out there but what is interesting is this how in an active production workload can you avoid hot partitions or can you avoid shards some of them which are hot while others are very cold and for example you do not have a for example including Citrus and in other databases as well actively you do not have a hot and a cold tier in terms of storage or data and for you to run a query saying run this as a hot query run this as a slow query which means the query can go to a more cheaper storage or the query has to go to an in memory storage like in terms of the query pattern that is not very common if you kind of thing. The other option is actively real time assuming that I have all my data that I need can I rebalance can I remove components from one shard includes data movement and move it to another shard so inside this shard rebalance and distributes shards the old and new shards so the worker shards are balanced and the best part of it is it gives you granular control as you as a DBA or a developer can actually say in fact you can say I wanted to move from this particular shard to another shard one example of that is this tenant isolation so for example there are commands like you can say isolate the tenant to a new shard and you provide that table name and the tenant ID in this case a new tenant has come out in shard four and we want to say that here I want to separate it into an entire new worker node and then have this single tenant into that. So the difference is and in fact if I coming from my NoSQL hat if I look at that there are systems where in this happens automatically while it happens automatically could be a extremely cool engineering problem for us say I talk from Microsoft for example for us to solve to say in a multi-tenanted system how we can do that we actually see a lot of customers saying that hey we want we know our workload the most and so we want to be able to say which is that large tenant that needs to be isolated so systems like site is actually provides the granular control and that is something that is very much valuable. The other thing when I say key differentiator it is like the next availability of the open source PG version that is available will be available on Citus right there. So next we talk about the high level architecture of how the Citus extension works. So essentially a PG client would hit the coordinator node the coordinator does not contain the actual data it contains mostly metadata and also there are something called local tables that you can say it can reside only in the coordinator so it contains fair amount less amount of data but mostly the routing patterns to say which worker nodes have which shard and so it can you know route the request and aggregate the request part to create parallelization so that is the coordinator and then you have multiple worker nodes depending on again the user's choice to say I want to have two three I can even do a four core machine and scale it to six eight core in place or I could be able to you know fork it out and say I want X number of worker nodes. So next when we talk about the coordinator nodes I said it contains node with metadata information typically used for routing so in this case in this sample you kind of see that there are multiple worker nodes four of them here and each of the worker nodes can have X number of shards right they could have various number of shards. Now the interesting question is how is the shard decided and in case of PG for example say a PG a database has some 50 tables. Now all of these 50 tables need not sit in all the shards for example so there is the way we define it for example I would want to just go through a small example here like how do we take an example for example and having sales transaction here so a local table when I say create a table local table is created which has sales transaction ID the customer ID date and amount just an example that we are just sharing and then when we say create a distributed table and then I am giving it the customer ID as the shard key and then I am saying I want this as a sales transaction table so create a shard on this customer ID so you will see that all customer IDs 111 will be considered as one shard customer ID is 112 so all the sales transactions for one particular customer will be stored as a single shard. Now what is interesting is the three kinds of tables I will come back to this but I want to just talk about this so if we understand the architecture there are three kinds of tables this table that we saw here this the sales transaction is what we call as a distributed table that means this table exists in every worker or the subset of worker it is sharded table that is a distributed table and for the distributed table you need to have a shard key on which it distributes typically if you make a query based on the shard key then your query will go only to those worker nodes which have that shard data. For example there are multiple tables but all of those tables have that shard key as the primary key then your query is the most efficient because it is just going to go to those shards in that particular a single worker node or maybe couple of worker nodes to get the data and all the multiple tables which are sitting on the shard key for the respective key ranges will also be co-located with each other so that is your distributed table reference table is a table where you feel that I do not have a exact partition key or a primary key on which I can actually map two or more tables so I want the table to be there in every single worker node so situations where I know that I need the table to be again like a copy of the whole table to be stored I store it as a reference table and for it and the local table is typically the ones that stored in the co-ordinator node for a specific kind of you know joints that you want to do you have some sort of a you know you have a product catalog and you say I have these five items that are always going to be there or you know I want to merge it for a particular campaign and the campaign will go I can store it as a local table merge it and once the campaign is I can drop that local table so that is the high level there and the other thing is how are queries routed so that is what I was trying to just say that a high level when you write any query on the Citus engine essentially so query is typically what we say is we would say we want the query we want the query based on the shard key the query will go to the co-ordinator of the node and then it will be routed for example in this particular case will be routed to that particular shard which has that data now if the query has joints or query has multiple tables and there are two scenarios that can happen one scenario is the shard key is the same shard key in those multiple tables then actually the way the system is sharded is such that only those respective co-located data ranges is actually showed which means you will have lesser number of queries to be made on the worker node two is the option to use reference table or local table so that is one three otherwise the query will actually run to specific worker nodes that have and that is kind of the more worst case scenario or the situation where performance wise it is probably not the case the couple of other use cases in terms of what we wanted to cover in this session one was just highlighted four main use cases real-time data analytics mainly because we say real-time analytics means that you want a better kind of query query which includes some smaller aggregations query includes some real-time data and stronger query pushdowns so it helps to have Postgres engine for that and we see site is better there enabling htap where the same engine can be used both for OLTP and OLAP workloads IoT based elementary systems because of the real-time nature and a lot of so situations which have lot of concurrent needs tend to be better off in this site is because they parallelize workloads better and then definitely huge huge customer base and adoption and seeing in the multi-connected SAS workloads here so when we look at real-time or time series operational analytics in terms of reporting here is so the diagram which is there which is put in a lot of the customers that I have seen which includes a lot of Microsoft based workloads such as you will see data breaks hosted on Azure data break this could be any kind of web app service could be any push notification service like APNS, GPNS, in our case NPNS which is Microsoft push notification server any eventing systems so on and so forth but essentially as you see there is there are two three things that is interesting the site is engine enables what we call as spread out, shard out etc and hence we use the word hyperscale but in the true sense when I say hyperscale you probably want some cloud vendor to maintain that hyperscale for you but there are people who actually take the open source site is to be able to set higher hyperscale clusters also so that is very much possible freshness of data availability durability concurrency all of those are obviously the most important. Choosing the shard key is probably the most important decision in just as we do that in any no-sequel components inside this as well so that is important then scaling multi-tenanted SaaS applications as we spoke about and there are some interesting pitfalls IOT as well just go over that purely IOT workloads and considerations high transactional and critical workload considerations so those are the big things that we want to talk and just to close off today's session in the interest of time is some of our large customers for example open source site is have about petabytes of data sitting through situations where suddenly when we have huge number of workloads like concurrent users 60,000 to 100,000 users something that site is scales there is a site is con which is coming up this week actually April 18th and 19th I wanted to close today's session by saying that these are this is purely based on our open source site this engine and our contributors who are founding that pretty interesting sessions available there things around efficiencies of shard rebalances other extensions on PG high availability being baked and so on so with that thank you. Do we have questions for Sharnia Sharnia so the data model itself needs to be changed from an existing Postgres database to be adopted into hyperscale It's a very valid question Do you see any intelligent solutions being built at Citus which helps in bridging this gap schema migration is not expensive affair or it's not something that is needed 100% when we say moving from relational to no sequel there is an entire data modeling exercise that's needed whereas existing PG to PG Citus there are two kinds of activities needed one is based on what is the main query patterns and whether I have a right shard key on those query patterns and two is how is my data exploding or expanding how what is my elasticity needs and so based on those two there seems to be some tweaks but essentially if you have for example 20 tables on Postgres you have the same you have so the overall all extensions will be equally supported similar foreign key primary keys can be construed so I would say that is that's there but we don't have a ready made tool if that's your answer we have to work on building something the shard rebalancing is that part of Citus or is that part of the cloud service it's part of Citus the latest 11.2 provides shard rebalancing I have two questions the first one is just now it says it has parallel processing so does it mean that if I run a query across multiple shards so it will do an MPP on all the shards so if you do a query on multiple shards it will spin off separate threads and it will run the threads on each of the shards it will reply the coordinator will aggregate back in Respawn the upcoming version on Citus is going to parallelize the coordinator that is not yet available but that's coming up so we are really out of time we need to have our next speaker thank you so much but I'm going to be there here