 Hi, everybody. This is Robert Hodges, and I'd like to welcome you to my talk, Small is Beautiful, Using Clickhouse Data Warehouse for Edge Analytics. It's an honor to be selected to present at the Open Networking and Edge Summit 2020. Before I get into the talk, I'd like to thank everybody that helped put this conference together. It's wonderful to be able to come together, even online, and share all the things that we've learned. With that, let's dive into the talk. So a little bit in terms of introductions. Once again, my name is Robert Hodges, and I'm CEO of Altinity. We're a software and services provider for Clickhouse. It's a popular SQL data warehouse originally developed by Yandex. We're a major committer to the code, and we're also a community sponsor in the US and Western Europe. Back when we could travel last year, we sponsored at least 12 events, including meetups, conferences, things like that across the United States and Europe. As far as my own background, I've been working on databases for over 30 years, with a few detours into things like virtualization and security. Clickhouse for me is database number 20, and it's one of the most exciting databases I've ever worked on. I hope that by the end of this talk, you'll understand some of the reasons for my enthusiasm. Let's start with a quick introduction to what analytic databases do. As most people on this listening to this talk understand, there are many different types of databases developed for different reasons and used in different ways. Analytic databases in a nutshell answer questions across large amounts of business data. And those questions are often very open-ended. For example, taking this simple example that I'm showing you here, which is a table of sensor data, including time that it's taken as well as the temperature. We can ask questions like, what's the current data across one or more sensors? Are there particular sensors out of spec or misreading? And what's the daily cycle of temperatures? So the questions could be very open-ended. Something that, for example, a human analyst would be doing. They could also be questions that are being answered or asked by other applications. So, for example, an application that needs to take action based on the temperature in certain locations. ClickHouse is a database that fits into exactly this pattern. And let me tell you just a little bit about it in case you haven't heard of it. So it kind of combines the best features of databases like MySQL, which are used for transaction processing, and new analytic databases like Vertica or Snowflake. From the MySQL side, we see things like we have a single binary just as MySQL does. We speak SQL, of course. We run practically everywhere. You'll see some examples of this as we go along. And we have an open-source license. In the case of ClickHouse, it's an Apache 2.0 license, which is particularly flexible because it allows you to operate it in any kind of software you want. You can also embed it in software that you ship. From the analytic side of the house, we have features again like SQL, which is very popular in analytic databases. We also have shared-nothing architecture, so coupling of storage and compute, similar to the traditional vertical architecture. One of the most important things is we use column storage. So data is stored in arrays, which means that we can iterate across them sequentially, both for reading and writing. We use vectorized query execution to break those arrays up into pieces and process them using SIMD instructions. And we have very good distributed query capabilities with built-in sharding and replication. Beyond that, we're really fast. And that's something that really grabs the attention of most people that start using ClickHouse. Just a quick introduction to how you get this and where it runs. So ClickHouse has community builds for four architectures. The main one is Linux running on the x86-64 platform, so that covers Intel and AMD. That is Mark Green because that's the thing that we optimize for and is most heavily tested. There are also builds available for Linux running on ARM, Mac OS, and FreeBSD. Those are less stable, and if you want to run them in production, you may be well advised to build them yourself. As far as installing it, on Linux, it's very easy. So the bottom of the screen shows the two commands that you must execute when you're running on Ubuntu, for example. App Get Install grabs the Debian packages and installs them, and then System CTL Start to start the server. With a good network connection, you can do this in 60 seconds. There's a little bit of bureaucracy above that to install the repo and make the packages available. And below that, we have an invocation in the ClickHouse client. That's how we actually talk to the database, as you'll see in a few minutes. Once you get connected to the database, you're going to create tables. So with a SQL database, everything is stored in a table, and this is a typical definition for a table of temperature readings. I want to highlight a few things starting down at the bottom. So first of all, we have this thing called an engine, merge tree. That's the workhorse engine for large tables inside ClickHouse. The name comes from the way that it merges data in background, as we'll show in just a second. Merged tree engines are broken up into pieces, because they tend to be very large. In the example I'm going to show you, we have 500 billion rows, so you need to break them up into parts. And that's what this partition by does. In this particular case, it's saying break it down by month. And then finally, we have an ordering of the data within each part. In this case, it's by sensor ID and time. This is important because it helps us control and guide the compression and get better reduction of data size. It also helps with search, as we'll see. So looking at the columns themselves, we can see we have four columns, and I'd like to just highlight a couple of keywords. One is codec. If this allows us to control the compression of the data very exactly. In general, if you say nothing, ClickHouse compresses all data using LZ4 compression, which gives you pretty good out-of-the-box compression of data. But in this particular case, we're also applying a codec, something called double delta for the first column, which is a type-specific transform that reduces the data size. And we'll talk a little bit more about how that works. And then applies ZSTD compression. So we get better compression at the cost of somewhat higher CPU and writing. Another thing in the third column, we have a date, and that's actually an alias, which means it is computed off the column before it. So that allows us to have additional columns that we can reference, but they aren't actually stored, which saves space. Let's look into the structure of the merge tree table when it's laid out in storage. So this is kind of a high-level picture, but basically you'll see that if you go into the table, you'll see that they're divided into parts. And as you look at each part, you'll see an index file. This is a primary key index, which allows us to locate rows efficiently. And we also have the columns. For every column, you'll see at least two files, one that's basically a list of compressed blocks, sort of numbered by which row they occur in. And then the other one is actually the compressed blocks themselves. And in the lower example, we show these little white boxes, show how the index is used. When we're coming through the index to locate data, we will read what's called a granule. So the index is sparse. It only includes, by default, only includes an entry for about every 8,000 rows or so. That ensures that the index can fit easily into memory even for very large tables. We call that gap between the rows a granule. So we come in, we locate the granule that we're interested in, and then we can use the go out to the columns and basically index into the compressed block that contains the data for the granules. You can, of course, search for data directly. You don't have to use the index and click houses. In fact, very efficient at doing that. Let's talk a little bit about what happens when you insert data. So one of the interesting things about click houses, it has very fast insertion. So we call this ingest. And what click house does is when you insert a block of data, let's say it's 10,000 rows or 50,000 rows, these are typical amounts for column storage. It will create a part. It'll just grab the data, quickly sort it, add the index, and write the part. And what that means is as soon as you insert the data, as soon as your client gets a response back, the data is queryable. But the problem is that if you make a lot of little parts, it's not very efficient to do queries, particularly if the data becomes large. So what click house does in the background is it constantly scans the parts, looking for little ones that really belong to the same partition, and then it will do what's called a background merge, where it'll grab the data, put it in a new and bigger part, and then atomically replace the little parts that came before it. And what this means is that click house kind of has it, it has the best of both worlds. It has very fast insert because it does this fast but half-hearted job of storing the data, and then over time optimizes it so that the reads become more efficient. Click house also gives you a lot of control over CPUs and in fact uses them quite efficiently. Excuse me. So this is a simple example of a query that we're running and what we're doing is using a property called MaxThreads, which allows us to control the number of CPUs that are used. These are the CPUs that you see in a Linux command like LSCPU, so they could actually be hardware threads underneath and likely are. What we see here is we set the number of threads to 1, 2, 3, 4, and the graph over on the right shows the effect on query performance for this particular query. So what we see is with one thread, of course, it's slowest. The time goes down by about a half as we enable two threads and then it decreases asymptotically thereafter. What you're seeing typically is the effect of the fact that some operations like grouping and sort tend to be serialized and therefore adding more CPUs at that point doesn't help. Let's talk about codecs and compression. So I introduced those in the table, the table definition above, but it's useful to look more in more detail at what's actually happening. So in this graph, what we're showing is for the three columns that we're actually storing, we show first of all the uncompressed size just as a percentage. That's obviously 100%. And then the next column shows the effect of applying a codec on it. So in the case of the sensor ID, oops, that's a typo there. Sorry about that. The sensor ID is going to apply a Delta Delta codec and what that stores is the slope of the slope. So in other words, the rate of change of the rate of change of the sensor IDs as they're encountered in the column. Because we sorted by sensor ID, we basically have a monotonically increasing sequence. So what this means is there's actually going to be very small, that the slope is going to increase very gently or in fact not at all for long stretches. So we get pretty good reduction of the data size down to 3%. We can then compress that with ZSDD. So the total data size then is well under 1%. It's almost a thousand fold reduction in the uncompressed size. The same sort of compression is available for time, which is a similar number with similar behavior. When we look at temperature, that one responds less well. The codec that we're using in this particular case does not reduce the data as much and so for that reason the compression is not as great as well. So the key thing here though is that by reducing the amount of data, we can dramatically speed up IO because it's just less to read from storage. There's another really important way that we reduce the data size and that is by anticipating the kinds of questions that people are going to ask and then structuring the data appropriately to answer them. The most common and most practical use of this idea is for aggregates. So for example, if we want to ask a question about the daily minimum and maximum state of the temperature, the sort of max temperature that's recorded per device per day and the minimum, we could of course go to the source data, but that's very expensive. What we can do though, which improves the performance enormously, is to use a materialized view, which basically transforms this, reads off the source data, and then creates a daily table that has the aggregates already calculated. This shows an example of how this is done. The code down below creates the materialized view and again the reduction in data size can be dramatic. In this case it reduces the data size that you have to scan if you want to look across all values by a factor of 300. So this gives you some of the background for why Click House is particularly efficient as well as fast. And I think now what we want to do is go ahead and do a demo that shows some of these things in action. So I'd like to introduce you to my Intel Nuke. Its name is Pika. And you can read the specs out of the table. It's pretty slow. So it has a Core i5 processor, 1.3 gigahertz, more or less adequate RAM and not bad storage, but this is not a quick machine by any means. Let's go escape out of this and go have a look. So what I'm going to do is connect directly to the database. So I fire up Click House client and now I'm going to run a few queries. So let's first go look at the number of lines of code in our table. There's the query. If you read carefully the data that's shown there and kind of imagine commas or periods stuck in the right places, this is 500 billion rows. This result comes back really quickly and the reason is that like many databases, Click House is optimized to count rows very quickly. Let's do something a little bit more complicated. Let's look for the maximum temperature reading from a particular device. So we're going to go to the source data and let's just paste this in. And so we're asking the max temperature for sensor one and we get again, get that answer back very quickly. In this case, it's 25 milliseconds. You'll notice though that we actually had to process the equivalent of close to 800,000 rows. Yet we got this response back very, very quickly. The reason here is that we're taking advantage of the compression which reduces the amount of data that we have to read, plus we have a key which allows us to select the data to read very accurately. Let's do something that stresses the database a little bit more. Remember, this is a nuke. It's not particularly powerful. So let's go ahead and do something that's going to force a scan of a lot of data. All right, so what we're looking for is basically sensors which are either abnormally high or abnormally low. So we're looking for the sensor that has the lowest temperature on a particular day and the sensor that has the highest temperature on a particular day. This is a way of looking for outliers. And you can see that it's moving along at about 80 million rows per second, but it turns out it's going to have to scan something like 125 billion rows to get the answer. So this is going to take a while. Let's just have a quick look at how Clickhouse is using the CPU. So what we're seeing here is the H top running in another window which shows that we're getting close to 100% utilization on each of the CPUs. In general, Clickhouse is very effective at fully utilizing CPUs. In fact, depending on how you run it, you will typically get 100% utilization. But we don't want to wait because this is a short talk. So let's try and get this a smarter way. What we're going to do right now is we're going to go ahead and ask that same question against that materialized view I showed you just a few minutes ago. So let's put that in and in this particular case by having some pre-aggregation we can get the answer we want within 1.2 seconds. So that's an enormous speed-up and actually means that we can get useful information out of this very large data set just by anticipating the data we want. And then once we find a sensor that's interesting, we can actually go back to the source data and then drill down to find out exactly what's going on. So what I'm going to do now is go ahead and do exactly that for the sensor that had the minimum reading. Let's go see what was happening on that day. So we're going to use a kind of cool click house feature. We're going to go grab the sensor data, aggregate it by hour. This is all done off the source data set and then we're going to use a handy feature which is an ASCII bar graph if you've been around Linux or Unix systems for a while, you've seen these before. And what we see is that one o'clock on that day is a low temperature reading. So something was screwing up on the device or something else was happening, we can now go investigate out in the real world and see what happened. And you'll notice that this came back in 36 milliseconds. So being able to drill down very quickly on the source data is a powerful feature. So I hope with this this is giving you sort of a sense of the power of click house. What I'd like to do is go ahead and put this away and go back to our presentation. So let's pull it up and go back to full screen. And we'll say goodbye to Pica. And what I'd like to do now is move from talking about a single analytic database to thinking about analytic applications and how we might set these up in a way that's applicable to edge environments. So as we're building the analytic applications there are a number of patterns that we have to think about that are common because an analytic application is data plus software to enable to, you know, to enable you to get useful results. So for example, one very common pattern that we see is services with embedded analytics. And think of that as, for example, the temperature sample that I just showed you, something to monitor temperatures, but maybe another application that's monitoring an assembly line. These are completely separate applications and they'd like to have their own analytic databases. So this is something that we need to accommodate. Another thing that may happen is if we'd like faster results or we have more data, then we probably want to have some sort of horizontal scaling of the cluster. So we're going to basically have multiple databases working together both to give a scaling and also to have copies of the data in case we lose something. And then finally many analytic applications are actually fairly complex. They might include parts that handle ingest, machine learning, administrative settings, and visualization of data. So we need a way to think about setting those up as well. So it turns out that there is an answer for this which is widely used in IT environments today and is now attracting increasing interest in edge environments. There are in fact a number of talks at this conference that talk about Kubernetes in edge environments. So Kubernetes is is basically a system for managing applications that are built on containers. So the notion is that your services that is to say your applications are stuck in containers there you build them, you put them in Docker for example whatever container system you like. And then Kubernetes will run them for you. It does this because through using an intelligent cluster manager which knows how to distribute the Docker containers to different hosts Kubernetes also has a notion of a life cycle for applications that includes deployment, upgrade, adding and subtracting resources, and taking them away. And finally it has a really powerful feature which we call configuration as code. So what you do when you want something to happen in Kubernetes is you basically fill out a YAML file to describe the type of thing that you want to create you submit it to Kubernetes and then Kubernetes will figure out how to create it for you to make it reality. So this is very powerful and actually well suited for edge applications which are often remote and need to have operate in a more less automated fashion but may have a lot of complex software packed onto them. So let's just take one of those patterns I showed you. Let's just do the clustering and show how that, how Kubernetes enables that. And then this will also give you I think insight into how applications in general could work. So in the case of Clickhouse it has very well developed horizontal scaling features. As I mentioned partly before, we have a notion of shards where you can split the data up into disjoint subsets and you can think of that as adding IOPS. So it's basically spreading the data across different machines with independent storage so you can get results faster you can also insert faster. The other thing it has is replicas. So these are helpful because if you're doing a lot of reads and analytic applications often focus on reading as opposed to writing the replicas help you deal with concurrency. If a lot of people are asking the same question on a shard or set a different variety of questions on a shard having more copies is faster. It's also helpful for availability because if you lose a copy you've still got remaining copies that you can use to restore it. So this is a basic model and this is something that we've spent the last year and a half to two years implementing on Kubernetes. The problem is that any kind of distributed data on Kubernetes or a complex application or multiple applications is pretty complicated. So Kubernetes has defined a bunch of little specific resources that the application maps to and if you had to do this for a ClickHouse cluster it would be hard to maintain. You can just see the example. The pieces here moreover we have ZooKeeper which we depend on what we need is something that just basically puts this in and sort of batches this up into a single thing and makes and sort of abstracts away all the little parts into something that's simple to manage. So it turns out that Kubernetes has a really great feature for this called operators. Operators allow you to implement what are called custom resources in ClickHouse. That means in addition to the resources like deployments or storage persistent storage that are already there, you can create your own. So there's a ClickHouse operator which does exactly that for ClickHouse installations. So what it does is it reduces all the settings for a ClickHouse cluster to a single resource file which you can just feed into Kubernetes. It'll be forwarded to the ClickHouse operator the ClickHouse operator will look at it and figure out how to make a reality out in the Kubernetes cluster and it not only will deploy it but if you give it a if you make an update for example change the number of replicas the operator will look at that and then do the minimum changes to make that a reality. This is a very powerful operators are very powerful feature of Kubernetes and in fact they're generally available for almost all databases that run cloud natively. Let's look at the practical details of what that custom resource would look like. So here's a basic setting for a data warehouse. An example I cooked up for this demo so it's got a name so we'll be able to find it in Kubernetes because we might have a bunch of them if different applications need their own data warehouses. We have the detail configuration and the most interesting thing here is setting up defining the shards and the replicas which is actually fairly complicated. It's just two lines number of shards, number of replicas and then we'll figure out the rest. We'll also have a reference to this service that we depend on. This is an example of how you can actually refer between applications in Kubernetes and allows you to tie applications together in group. Then we have templates. For things like networking for the actual software version that is to say the version of the pod that you're running the container and then finally storage. I'm just showing you an example of the storage that we're using for this particular cluster and it's called a volume claim template and what we're basically asking for here is 100 gigs of storage which a block storage. So that's Kubernetes and what I'd like to do is just touch on one final thing which is how to link this edge location you have up to the public cloud. So you've got data on the edge, you're processing there, you may be reacting to conditions but you want to send it up to the public cloud for broader analysis. So there's a bunch of places that you could hook into you could write to everything from loading it through the application, REST APIs, all the way down to writing it to S3 compatible storage. It turns out ClickHouse has a bunch of useful features that help with this. ClickHouse is what we call a polyglot database which means that it can write directly to a number, read and write directly to a number of useful data sources everything from HDFS to Kafka. So I'll just give you a really simple example of this one of the possible paths how we would get to the data out of your edge location into the public cloud. What we'll do is we'll have the fact table, the daily aggregates and from time to time we will just write a file up to S3 and then ClickHouse can be programmed to go ahead and read it. Let me just show you the sequel that makes that possible. Here we have ClickHouse writing the data from that sort of data for a particular day in the materialized view up to a parquet file in S3. So we write to a table function that's a thing like a table we you can see in blue the location of the table things like what format we want to write the application and then some information about the type and then a query that pulls it out of the materialized view. This looks complicated but you have to ask the question compared to what. In this case in this case sorry about that. In this case what you're seeing is 10 lines of sequel that just replaced an application to do this data movement. So it's actually pretty simple. So with that we're done. I just very briefly to wrap up ClickHouse gives you some new options for analytics and edge devices as we saw in the demo. It runs quite well on even relatively underpowered machines like my PICA. Very efficient use of CPU, RAM and IO and in particular the compression of the materialized views allow you to manage relatively massive amounts of data on small devices. Kubernetes as you step up gives you a way to enable deployment that goes far beyond single devices and single databases and then the polyglot features as well as the portability allow you to tie into cloud analytics. This is open source software so please try it out. Tell us how to make ClickHouse better for edge computing. We're a community and we'd love to hear from you. That's it. Thank you very much. We're hiring. Check us out if you like this and this is something you know how to do and here's our email links. Our email as well as code links. Thank you very much. It's been a pleasure to give this talk.