 So, it's my pleasure to introduce Asghan Erdogan from Citus Data, who is going to talk about, I forgot the name for you to talk, I'm sorry, keep calm and scale up postgres. Yeah. Yeah. So, yeah, without further ado, let's just get right into it. Hey. I'm Erdogan, I'm one of the technical founders at Citus Data, and I'm really excited to be here. Today, I'm going to talk about the trends we see in the big data space. What we think makes PostgreSQL unique, and how we at Citus Data are scaling it out. To start, let's rewind back 10 years and look at the data landscape in 2005. Here I have a very simplistic diagram. On the x-axis, you have different workloads, and on the y-axis, you have data types and how much structure these data types enforce. And the idea is, 10 years ago, databases used to be simple. You needed data warehousing, use an RDBMS. You needed real-time analytics, short, insert, update, delete, select queries. Highly transactional workloads. The answer was simple, use an RDBMS. And then, boom, data exploded. In this graph, the amount of data captured, the idea in this graph is the amount of data captured in the world, thanks to the internet, your cell phones, or little devices far exceeded Moore's law. And a single machine just wasn't enough to keep up with this data growth. So first companies such as Amazon, Google, and Facebook, and then others started trying out new technologies. This led to a fragmentation of databases. If you had lots of data, you ended up using many and very different database solutions. On the operations side, you had your document store databases. Then new SQL databases emerged and then brought SQL back. On the analytics side, we saw Hadoop emerge. People started using the Hadoop distributed file system, writing map reduce jobs, and issuing hive queries. People also use proprietary MPP databases. And this is crazy. For each new technology, you had to train new people, have them understand how each database works, have them debug and maintain that database, and ensure that it all interoperates with other databases in your company. And customers don't want to operate and maintain many separate databases. They want things to be simple, familiar, and reliable again. When we went and talked to customers with big data deployments and asked them what they wanted, we received three sets of answers, three buckets of answers. First, my database should be able to handle multiple workloads really well, both operational and analytic workloads. No one thought MongoDB was a realistic solution or an alternative to replace their Hadoop cluster. Second, I want SQL with a twist. Some of my data is semi-structured, and I want to be able to query that data too. Third, I don't want to start from scratch. I want to be able to use the tools or features that others have already written. And I want all of that to just work. And if you talk to these customers, they all agree that PostgreSQL already does all of this at the single node level. In terms of performance, PostgreSQL's operational and analytic performance has been increasing steadily with each new release. This diagram runs the TPCB benchmark on the same machine across different PostgreSQL versions. These are the versions. And the blue line at the bottom is PostgreSQL 7.4, and the purple line at the top that's almost 10 times faster is PostgreSQL 9.2. Thanks to all the cool work this community has done, PostgreSQL can do operational and analytic workloads really well. In terms of expressiveness, PostgreSQL also has extensive SQL coverage and HSOAR and JSONB data types and indexes and operators for these data types. Also, there are 150 official extensions publicly available. For example, let's say you counted the number of unique visitors to your website for day one. And you counted the number of unique visitors to your website for day two. And you want to quickly merge them together to get the unique visitor counts over day one and day two in real time. There is a hyperlog log extension for that. So if you can't express what you want to do in PostgreSQL, you probably can't do it with any other database. And when we're talking about the ecosystem, PostgreSQL is killing it. This is a survey from a developer website called Hacker News. How many of you in here are familiar with Hacker News? Okay, a fairly large number. And then this is a yearly survey that asks what database is your company using? On the left-hand side, you see survey results from 2010. On the right-hand side, you have the same survey results from four years later. And when you look at these two surveys, you see two differences. First, PostgreSQL's usage has tripled in four years, far exceeding any other database. Second, in 2010, my SQL had more usage than PostgreSQL and MongoDB combined. Fast forward four years, the news is PostgreSQL has more usage than MongoDB and MySQL combined. We talked about three properties that are important, operations and analytics, expressiveness, and the ecosystem. The challenge is how do we keep these three properties while also scaling out? This is really, really, really hard. The truth is you can't scale out a heavy right workload in the same way you scale out real-time analytics. You need fundamentally different distributed query executors and choose among them based on incoming queries. Second, you can't do a half-hearted effort. You need to have the full theoretical framework to plan and optimize distributed queries to minimize network IO. Third, you can't fork the database. If you do, you lose the entire ecosystem. Thankfully, PostgreSQL has an extension framework that is to get full cooperation from the core database without forking. This is how these properties look in a picture. For unified analytics and operational workloads, we have three executors, short-request executor, real-time executor, and task tracker executor. If you have an insert query that just needs to get routed and replicated, we pick the short-request executor. If, on the other hand, you need to execute long-running analytic queries that need to pay attention to resource usage on all the nodes, we have the task tracker executor. For expressiveness, we create a logical plan using distributed relational algebra. We then optimize that logical plan and convert the optimized logical plan into a physical plan. Finally, we don't fork, but instead extend the core database using these extension APIs that are in Postgres. That way, we get to keep the ecosystem. This is how those boxes map to different site-to-state products. We have two open-source extensions. C-store is a columnar store, and PGChart scales out high throughput reads and writes. Site-to-state B targets to use case where you want your queries to be massively parallelized. A good US case is when you have distributed table joins. Here's an example of how all of those boxes work together nicely in real life. Cloudflare is a global contact delivery network and DNS provider. More than 5% of the global internet traffic flows through Cloudflare. There are 2 million businesses on Cloudflare, and each one of them looks at real-time dashboards to understand their website and network traffic. And this entire data for all of those businesses is served from a site-to-state B cluster. The cluster also uses the hyperlog log extension that enables Cloudflare businesses to see unique thread counts to their websites over varying time intervals in real time. And some other examples that use cool Postgres SQL extensions. NuSTAR is a publicly traded company that uses PGChart for real-time data ingest and site-to-state B for real-time analytics together. Heep stores terabytes of data in memory in semi-structured and compressed format. When a query comes in, the query gets parallelized to hundreds of CPU cores, and results are computed and returned back to the user in less than a second. Agari uses custom data types for real-time security. Migros uses geospatial data types. And there are many others. We're really excited to be here. We are keeping calm and scaling out Postgres. Thank you.