 Alright guys, thanks for coming for the last lecture we have in our time series data is. So I just want to read some quick emails that I've gotten from people that have been watching these videos on YouTube and they sort of have some suggestions and praise for us. So Justin L. in Seattle says, thank you for posting your time series lectures on YouTube. I have really learned a lot. Evan J. in Brooklyn, New York says, hey Andy, great time series data is YouTube videos, keep them coming. The last one is Samuel M. in Newton, Massachusetts says, I don't know how you can call these time series lectures without having time scale DB, you need to have them come give a talk. That's the best time series data is out there. I was able to track my X's location near real time, GPS transponder I put in their car. They don't know it's there and now I know everywhere they go and find out why they're going on all these copy dates with people from Tinder. Time scale DB is the only system that will allow me to scale and allow me to track my X in that manner. And with that said, this is why we invited Mike at the last because this is the one we're very excited about. So Mike Freeman is a professor at Princeton University in the computer science department and he's the co-founder and former CEO but now CTO of time scale. Yes it says there. And so he's here to talk about the time scale DB. So go for it. And yes, it looks bad that the screen is stuck. At some point it may magically come down but somebody may not. Yeah, I think one of our big selling points is native geospatial support which is particularly useful for tracking people. We have no liability based on how people use the database. Anyway, thank you for having me. What I wanted to talk to you today, I'm actually here to the database group. I am academically not a database researcher. I am a systems researcher. This side of the room, not that side of the room. Yeah, this side of the room. These are my friends. You know, we'll see. I play one during the day. I love you. You did. No, absolutely. Andy's all. Andy is dope. Andy is dope. Today I want to talk about basically our experience building time scale on top of Postgres. We actually started as an IoT company in building basically performing a platform as easy to collect, store, analyze data and in building we tried a number of the open source databases on the market that are available. I'm not going to name names but in general we were unsatisfied with what there was so we built one ourselves and eventually we really decided to focus on really bringing this letting other people use this database and open source it and kind of that's what I want to talk about. So this kind of arose from our experience with IoT but what we found is kind of time scale data is much broader. In particular in this when I got to the meat of the talk, there you go. And so what's interesting I think in this series you actually had people from KDB, you had people from Influx and you see over the time time series databases specialized for certain areas. So KDB came out of finance, Influx and others came out of DevOps but we also see time series data emerging in many places from transportation logistics, from IoT and predictive maintenance from web and mobile inventing and what they need across many of these places is data is being generated at a high rate and we need a better way to start to analyze it. And there's a couple of major trends that's causing this. One of them is what we call really the evolution of data resolution. So think of your bank, your bank historically basically just kept the last state of the world. You can view this as the only thing the bank kept was a materialized view of the state of the database. But over time we switched to keeping every state of the world. Now banking transactions it's probably not surprising that you keep track of all the transactions. But think of logistics. It's different from knowing where my container is now versus where is every place that my container has been over time. To the future where we're basically going to be storing every interaction. So as you go along this path basically you're collecting finer and finer resolution data, you're collecting more and more data and this becomes more of a systems scale problem. And so we can ask, I've used the term time series data and I guess I'm at the last person's long series so hopefully you also have some notion of what time series data is. But it's interesting in that people in different areas often use different language that reflects actually kind of if they came from let's say DevOps, they have a particular definition of what they think of time series data. So broadly speaking we can think of this as data being collected from a bunch of different things or financial securities or instruments. It's collected over time and you might store different stuff. So a common scenario for example is just a type of IoT application where you have a bunch of measurement data like the average CPU, some time value, and some additional metadata about that information. Now this is one view, you could call this a narrow view. A lot of time series database describes how many time series they have and when they describe time series what they really mean is what is the unique combination of metadata or tags and we are going to define each of these unique combinations as one time series. So this itself would be let's say 15 different time series, all unique combinations of these tags. And this is often the cardinality problem of time series that you hear some people talk about. But of course this doesn't all have to be narrow, it could be wide so you can keep many information about that and the relational model often helps. For example if I'm tracking container ships and my temperature is quite high it's actually useful to know that in fact what's happening is I'm actually picking up waste heat from my CPU. So being able to maintain this relationship between these different columns actually turns out to be very useful in certain scenarios as well as combine it with additional metadata. So for example we might often keep more information if this is again containers, you know it's actually useful to know that this in fact is a shipping container, a refrigerated container of 40 feet because in fact the high temperature of my refrigerating container that's the bad sign. And so if you look at this model this actually fits very natural with a relational model but what most time series databases don't want to deal with this stuff because they don't want to deal with joins or they might denormalize it all at insert time. And so what ultimately happens is when you see the deployment of many of these time series databases in practice typically you find that the metadata often gets pushed to a separate relational data and they often join this in application space. Now this says oh whatever I have to write some glue code what this operationally need is now that your data analysts say hey I want to ask this question what they actually do if I have to wait two months in order to get into an engineering sprint so that the application code be changed to now support this joint application space. And I think I'm going to get a lot of questions here but Gar, yeah. Many of our time series people have assumed an important property maybe even a definitional property is that old data is less valuable than new data. First of all I want to know if you agree with that and second of all does it impact these related tables as well? Repeat the question. The question was is old data less important than new data? Let me just why don't I talk about this a little bit later come back to this. Yes and no and maybe it'll become clear after a couple of slides. So the question is with this model you know what do most people I think you heard about a whole bunch of time series databases in this seminar series and what are people using today? And actually this was from this year a survey of a bunch of database users by Percona which is a kind of big MySQL shop it puts on a big MySQL conference basically 68% of people say they use some type of NoSQL database. And so you can ask why is this the case? What were the database people working on for so long that everybody thinks we should just go to NoSQL? And you know why not use a simple relational table to solve that? I gave you the argument that's a very natural relational model. Well once the price of Oracle but we have these great things like Postgres and MySQL which is kind of Oracle. So let's talk about Postgres. So this is Postgres you know one of the difference in time series there's a lot of data compared to sometime you know your traditional let's say how many accounts that the bank has. This is just a small set this is the data set going up to 5 million rows. You know you're inserting one at a time you'll get better performance inserting batches will come back to that. But you start out doing basically 15,000 transactions per second. This is a single you know cloud node not too powerful but it looks good. And then you keep doing transactions the data set gets bigger and it goes off the cliff right. And huge variants you know this is the mean but you know we actually have these huge bands where a lot of times we're getting these you know hundreds of transactions a second. What is happening probably all of you could already detect this it's not too surprising. Here think about like a time series or a time is the primary dimension you have a B tree on it. But you know you want to index data you want to be able to ask questions about your data right. And if you have a single table and you have a B tree index on it as you insert more data you're going to want to maintain that B tree and that is going to insert it in the global order for the sort order that B tree right. So maintain the index means in the end you're swapping to disk a lot. So not surprising you know memory is a lot faster than disk. And so databases are happy when the when the writes are all in memory. And they're really happy when they're all on on disk. This is network attached SSDs by the way all the all the data from our from the benchmarks. I'll show you our network attached SSDs. So why not just a no SQL database you know what the kind of champion from no SQL databases for the last let's say 5, 10 years have been these log structure emerge trees. I'm not going to talk about the specifics but the way they ultimately end up using is one of the nice things about that. They support range queries over these sorted keys. They often are stored in kind of almost a column format when they when they're when these are used in these type of time series databases. And so you get really nice compression. That is one you know bonus for them. And typically what this leads to is I effectively keep a separate column for like the name. This tag set all that metadata I described each unique tag set gets kind of stored separately. And let's say some numerical value. Now the downside of it and you know all of these kind of modern no SQL databases have in the end adopted what effectively is a log structure emerge tree. Now in the end is that this does lead to some problems. And if you look at most of them they actually have very little support for secondary indices because they're primarily storing all the data. In that the order of that key and tag set. And particularly what we find is that because the kind of number of time series they support basically grows with the cross product of the number of distinct items in each that make up the tag set. So if you have a million unique device IDs in a thousand locations and you know 10 different device types that's already 10 billion unique combinations. And most of the time a lot of these time series databases cannot handle that type of carnality. Because effectively all of the different index structures for this logs which emerge trees have to be maintained in memory. And so what this has led people to is you often think of no SQL as it gives you flexibility and lets you push off the definition of your schema. But your very choice of which tags you choose has major implications on how you could query your data and how much storage it needs and why you can scale to. That's more of a limitation of these limitations rather than LSDM. Correct. Yeah, so I think it's hard. I'm trying to dance between the general design of this data structure and an implementation. And you know I'm being a little fuzzy there. And often it leads or does require particularly implementation things a lot of in memory data structures to support the lookup lookup structures in each of these less powerful queries losses equal ecosystem and no joints. And so what we said to do and this is again when we are still building an IoT platform is we want to try to build a database that could support this high volume of rights. But building it kind of on top of a rock solid relational database. So in fact we built this on top of Postgres. You should probably mention what do you mean you lose the SQL ecosystem because that may not be honest. Correct. And one of that is that most of these all these no SQL databases they either make up their own language. You know they have this crazy JSON REST API or if there's something Cassandra they define CQL or they define something SQL ish. And what that SQL ish means means they can do a simple select statement. But you know for example you can't have security indices. You can't have indexes on numerical values. You could only order by time. You can't have sub queries. You can't have windows. You can't do all that stuff. What that means is if you take a tool that speaks SQL it won't work with the system. And by a SQL SQL system you want is we have across organizations lots of things that speak SQL, ORMs, data ETL systems, visualization layers like Tableau. They all speak SQL. So if you have something that speaks SQL it could plug in and work hopefully out of the box with any of this stuff. And that was our goal. We built a time scale DB. And effectively we built this package as an extension on top of Postgres. And so effectively some of the benefits you get are what we engineered. Some of the benefits we get from the way we carefully engineer this on top of Postgres. And particularly I want to stress three things. The ease of use like I said supports full SQL. Some additional time-oriented features that we add. You get this one database. You could store relational data in your time series table in the same database. You could do joins across them. You don't have to kind of silo your data. And because it speaks Postgres you basically get the full ecosystem. Anything that speaks to Postgres basically works out of the box of time scale. In terms of scalability I'll talk about how we get high write rates, some time-oriented optimizations. In particular we've had people run basically on a single node hundreds of billions of rows. Individual tables that scale hundreds of billions of rows. And in terms of reliability kind of the key thing here is we actually in some sense if you think about a database that's having a front end and back end most of our work is almost all in the front end of Postgres. And we made this strategic decision to basically keep at least for now like the low level storage engine of Postgres unmodified because effectively it takes a long time to make a production-worthy database. And we don't have to worry if we have this corner case of the recovery algorithm of the wall. Correct? We get that from Postgres. And so affected by doing this though we inherit again not just the use scenario of Postgres but the operational scenario of Postgres. So you can use with you know PG Backup and Bouncer and Wall Replication and Wall and all that stuff with the works of Postgres is going to work with time scale. Two questions. Yeah so if you stick with SPL sometimes time series processing involves like iterated queries like if you have like sliding windows or just windows you want to keep building a model after traversing all the windows so how would you first express something like that in a skill or how would you consider like sliding windows or session based windows with S2L? So there's a couple two answers. One is we are adding basically UDFs that are providing new functionality that we don't expose to the user that does stuff like you know professionalization and fuzzy joins and various things that you might want. Some of this is like look if you actually talked a lot of users you don't always solve all these problems inside your database. And so there is some support for in fact there are other libraries that work with us for machine learning on the database but if you're doing not machine learning inference but machine learning training often what you're doing is you're pulling all the stuff into like you know Jupyter or a Python or our notebook and what you want is just a query you know a query into the database and then you're doing it often elsewhere. So again it's not trying to though solve all those problems. How's the number of features in the data picture scalability? You mentioned hundreds of billions of rows. Features could you? Columns. Yeah so this the limit we found here is just really you know just space this was on 50 to 100 terabytes. Clearly the all the benchmarks I'm going to give you have 10 column rows. Clearly your throughput is going to go down when these are really big or when you have big JSON blobs inside them but kind of the traditional trade off though you should think of those the same as you would get from a really small SQL database Postgres database. We just don't we just see that same performance at much larger you know we don't we don't hit that bottleneck. Okay. All the actions have to fit on a single page. Except you have toast so you could actually. The values go elsewhere. You can't take a tuple and have half its column on one page half its column on the other page. It has to be all together. The question is why do we get better performance in Postgres? We're just building on top of it and the kind of key insight perhaps not surprising to this very technical audience is that time series workloads are different and in particular they're different because if you look at traditional transactional workloads you're doing primarily updates. These updates are often randomly distributed amongst your tables. You know you're doing bank transactions moving one row updating another row. You're often doing transactions to multiple keys. You're touching many things at the same time. With time series workloads it's mostly a you know append-only workload. It's primary inserts. In particular it's primary inserts to the recent time interval. And this is the key thing for how we achieve better performance. And often you know the writes are kind of associated with some type of like time stamp and maybe some additional keys but they're not generally transactions to multiple keys as one transactional element. And what this means so if you look at this kind of picture of our time series data what we provide and this is how we kind of look just like a standard database we provide what we call as a hyper table and what this under the covers is we're actually doing kind of heavy partitioning or heavy sharding on this hyper table in one or more dimensions. On a first dimension perhaps not surprising is time and this is either can be in the database manually specified and then change so you could say you could start by saying you know the interval is like one hour and later change to 10 minutes if you want and we also have a version where the database will automatically adjust the time interval based on its observed data volumes so we can kind of adapt maybe not like auto-tune-ish, auto-tune but it could auto adjust the interval. And the second thing we support is zero more dimensions of arbitrary dimensions based on column partitioning either intervals or most time it's hash partitioning. And we call this thing I'll call this a chunk but you should think about this is basically stored as a table inside the database. So what does this allow you to do? So the kind of really what if you look back to the time series workload when inserts are to latest primarily to the latest time interval effectively what it means is there is going to be this wave of data where most of the rights are going to be and so your older data can still exist in the database but that's not where basically most of the rights are going to. And so that's one way where we optimize for it we kind of assume that rights are most of the latest intervals although you certainly can store as much data as you want. But the nice thing about this model we call the hyper table as a user all of your interaction with the database is with the hyper table. So when you insert it when you query it you just view this as a single table you don't worry about anything else going on under the covers and that includes for inserts and queries but it's also when you include indexes you alter the schema of the hyper table and this naturally propagates all these things to each individual chunk where these indexes are basically local to this chunk and that's again a key design point. Similarly constraints, upserts, triggers all these things are done interact with the hyper table and are properly flow down to all these chunks under the covers. So I think you're going to say that you have no focus on reducing the summarizing the old data or forcing it out of the database if the user wants to do this they can do it but you would keep everything. We give you that flexibility in that commonly what would happen is this is an example where it's a raw you know it's all your raw data and I'm going to jump ahead like a lot of slides and you could store this and the common thing is actually people then we actually one of the nice things because we support upserts is that this is the upserts provide a very easy way to then roll up data between these separate hyper tables. So thus what we typically have is people could specify you know three different hyper tables that turns out to be roll ups or aggregations between the smaller thing and then also specify different data retention policies on each. So you can in the database say keep your raw data for a week and it will age it out this is what your question was keep an aggregation for longer but again that's really just a configuration option up to the user. Correct an upsert says that if the data does not exist insert it otherwise update an existing record and the reason this turns out to be really nice is because it also avoids you to have to ensure that your writer provides exactly one semantics. So we've often found is that people have actually deployed time scale both on an edge device and in the cloud and they basically could have their edge device just keep some raw data and every so often just dump up data to the cloud and yet they don't need to kind of fault-tolerate make sure that each data is inserted exactly once. Similarly for data aggregation because you're upserting it what you could do in an easy way to handle late data is at the end of each window let's say five minutes you perform an aggregation then the next time around you actually go back 30 minutes and recompute them there's a couple different ways you could do this with you could carefully do this with triggers but this is like kind of a very simple way to basically kind of sweep in late updates to the database. Are you using Postgres as like inheritance K1 inheritance to have these different course reactivations? We use, no those are just separate tables we use inheritance for our hyper table design though. Okay so under the covers you know basically data is coming in you could do one insert this could be an individual row it could be a big batch transaction the query engine the execution model will probably take apart a batch of inserts route these tuples effectively to the right sub-tables and when you kind of try to write to a new chunk that doesn't exist the database will dynamically create a new set of chunks and start writing to them so these partitions or chunks are only created on demand and you know why this on demand now you could say well like haven't even Postgres and protects Postgres 10 has like some nice support for partitioning and like you know a lot of users will do this manually they'll create a partition per week or per month like doesn't that work there's a bunch of problems some of them have performance a lot have been management but as an example of kind of how we're trying to push this you know one user for example I was talking about you know sort 500 billion rows in a single table he was getting for Postgres I know you know Dave build stuff Dave Dave laughs at this he says my systems do 50 million inserts a second but you know they're doing 400,000 rows per second into the into the relational database and the key thing is they under the covers have 50,000 chunks right so if you think about a normal sharding and a partitioning database you're not running this thing with 50,000 chunks probably two orders of magnitude less and in fact the database here is dynamically creating a chunk every few minutes sorry 400k insert for a second with timescale correct yeah yeah this is timescale this is on Postgres but you know this is this is this is timescale okay so what are these benefits of chunking I've kind of stressed this and the key thing about what we want our chunks is we want these chunks to be right sized and by that I effectively mean that the several intervals one or more intervals basically fit in memory right and that is what allowed us to basically keep this insert rate provided again that your workload has the feature that most these rights are going to this latest interval right the second thing it leads to and this is related to your question about aging out is that it leads to actually efficient retention policies because when you actually in us when you say drop a drop data over a certain interval it's actually not what it's doing is actually looking for each chunk and it will only delete the chunk if all of the rows are over that interval so if you say drops in more than a week and this is a chunk that spans data from like a week plus six hours in a week minus six hours is not going to lead it until all the stuff is over that time interval and then what that means is in order to drop data you're never deleting rows you're just basically dropping chunks which means you're just deleting files on disk so it's kind of much more efficient and again because you're not deleting rows you don't actually basically fragment your tables you don't have to run these expensive vacuuming and that kind of keeps management much faster can you do updates on yes you can do updates it's just that if you're yeah we allow you to do updates is that your performance will not if you you turn up that heavy the performance is just going to kind of insert performance is going to drop down to a little fast no faster than post-crest but not that type of performance that we saw before this is going to be swapping to disk a lot but are you doing in-place updates versus like yes we are doing in-place updates correct we are doing in-place updates our inserts though to each chunk are not are actually in the insert order they're not in time order you could build an index on top of that but the insert itself to individual it finds the right chunk then this inserts in right order the other big thing that we do is we talk about tuple routing but one of the kind of on the query side a lot of that was how do you make inserts fast you kind of mostly are inserting to chunks stored in memory and the indexes on top those chunks are all local to that index so the index is all in memory on the query side what we actually do is we do kind of aggressive constraint exclusion so when we have 50,000 chunks and you give us a sequel query we kind of use the various constraints like what is time and what are these additional dimensions that you queered on to quickly prune which of these chunks we need to touch so rather than having to touch all of them you know we see that oh this one only talked about this device ID it maps to this hash partition let's just now do the query let's push down the query to these partitions not all of the data and similarly you could slice across time and various things now the interesting thing again kind of a minor point but turns out to in practice matter a lot is that we kind of have a rich notion of time so does anybody want to tell me what the difference between these two calls are true but why does that matter a lot this is me being professorial I'm cold calling on the audience no no this is this should this look it's now minus 24 hours seems like it's a fixed point this again is I mean both these are basically like you know imagine this is time is greater than this thing and when a human looks at this they should be like time should be greater than a certain point what's the problem I think you in the second part you already know the chance that you want to look since it is static somehow this is static next one I think in the first one this one you always change the as you keep doing the changes that's not the problem we're doing pre-processing you could have a static value and have pre-knowledge of what you're doing where as the now has to be evaluated at some point you can't do anything before that the planner is going to execute this and the planner wants to also have prepared statements in SQL and so now cannot be executed at planning time it has to be executed at execution time and similarly even if you made this a date it still breaks because the database Postgres is very conservative and Postgres doesn't know that the time between when you plan the query, prepare the query, and when you execute it the time zone of your server could have changed and this is dependent on the time zone of where you're executing the server so when you run this in Postgres it's going to touch all the chunks and you're going to be very unhappy but we have a strong notion of time so actually these type of things we actually move into the execution engine so we actually move them out of the planner and constify them in the execution time rebuild the query plan it's these little tweaks that from a research perspective don't seem very interesting but turn out to make a big difference when you're trying to use the system in practice that's not yeah I don't know what the other databases do if the planner is transaction you know the time stamp the now is when the transaction starts Postgres knows that too, I don't know why but the planner is different but if you go into the planner in the context of a transaction you would know what the time stamp is and you could do this and if someone changes the time zone that's still outside you wouldn't see that it's a limitation of what it's doing inside the planner and what it's doing inside the execution engine what though Postgres is being correct that it's actually pushing this to the execution engine I guess it's an implementation detail the problem is constraint exclusion and generating the query plan which is what constraint exclusion is is done at the planning time not at the execution time anyway it's just a detail so I want to talk about two other things it allows us to do and in fact the open source version of time scale just runs on a single node currently but the interesting thing is that even on a single node and all performance numbers I'm going to give you are with respect to that node is that we allow you to actually scale up by elastically adding more disks even to a single table which is kind of strange so you can do this normally by RAID and that is you're moving the elasticity to your storage system and I think a lot of people here know about how to build good storage systems but you know Postgres we have this notion of table spaces and what a table space is basically a mount point and you can map a table to a certain disk mount point now what we allow you to do is actually add many table spaces to a single hyper table because of course implementation wise under the covers a hyper table is just an abstraction and what we will do is as you elastically add a table space as new intervals get created they get created load balance across all the new disks this is particularly nice because as you age out new data as you age old data it's kind of a different way to do elasticity normally by elasticity you need to migrate data here you're relying on your workload pattern to now write this new data across your disks and there's other benefits obviously about adding disks and that faster inserts and parallel queries now I'll come back to your question in a minute one thing I'm working on a major kind of development thing for this next year is really to then scale out to many servers and usually there's this big gap when you take a database from one server to many servers the interesting thing is like are even from a query model and from an insert model our individual servers is pretty aggressively paralyzed already and certainly there are some things we need to work on some new issues that we're doing to take it out to many servers but kind of like the a lot of the query path kind of already exists today and how we built even a single server was there a question? sorry table space is just the name in Postgres that effectively is a disk mount point you basically say here's the mount point for this table you find serialized and stored copies of the database contents yes you map a table which is a file and that gets written to that mount point in the cloud this is particularly nice because you then can like mount a network-tached EBS volume to that thing and so you're kind of like what is elastically adding disks to a machine mean it doesn't mean much when it's on a rack but when it's in the cloud it means you basically just like connect a new EBS volume and kind of keep going the other neat thing about one thing in Postgres that even allows you for example to put indexes in your storage on different table spaces so you for example put your indexes on SSDs and your storage on like HDDs so what is the scenario in which inserts happen to a older time time region? time region it depends on your workload I mean some late data if you have sensors let's say some report late in some financial applications the market corrects itself so you know this trading thinking the market the price of the stock was this and then a month a week later some research arms says oh yeah we kind of screwed up that price it actually is this they often store both values they call this bi-temporal analysis but you're often storing that in us that's a time stamp from an old value so it really depends on your workload okay so that was one but you also said that you insert in the right order as opposed to the time order so insert is a two-step process the first thing we do is kind of decide all these kind of chunks are disjoint in time in additional dimensions let's just think of time so the disjoint so we first say which is the right chunk this is the chunk from now to five minutes ago then inside that region we're writing to disk just when it's at the end of that region so we don't have to actually pay the cost of sorting out of order data when you write any assumptions about the frequency with which you collect data? no people do it regularly, irregularly would that affect your chunk sizes or let me come back to that and that's actually next couple slides why don't we let me move on a little bit just to make sure we get to some of these so how exactly do we partition I think some of your questions are like how do we partition on time there are interesting issues here and a lot of this is related to until you start thinking about early and late data actually affects a lot of things so it's not really the frequency but it's actually the out of orderness the data actually has some important implications so what you'd say is like well if your end goal is we want this stuff to fit in memory let's just fit the target size let's just say we have XMM memories let's make chunks one gigabyte except I have this property of disjointness so we say well one gigabyte is probably let's say a day or a minute whatever it doesn't matter so it's one gigabyte but it's going to be roughly a day the problem is if you are fitting its size and let's say you have early data because if you have a big enough system you can't ever rely on your client clocks there's going to be some clocks that are completely out of sync right and you look at this chunk you've only written 20 megs of the one gig you get this thing in you say there's plenty of room to extend the range the time range of that chunk to hold this data except the problem with that data just came early and so what happened is as the starting in order it starts getting much too large and then you're again out of luck you know the things like Spanner suggest that you can and pay money and get better clock sync if you want to sure sure well this is the so Spanner paid money to make their cloud things synchronize streaming systems often cheat with this that they time stamp things when they arrive a lot of times what you want to time stamp thing is when the event was generated if you're an IOT you're thinking about like this little device on the edge most of these systems include Google cloud data flow and yeah so yeah yeah so what about static duration just make it a fixed time and actually you find operationally like people like this people like shouldn't they still like to say a chunk is about a day one of the reasons if they often know the range of their queries and you roughly line those two things that could allow you to minimize a number of chunks you touch the downside is you know you hopefully well if you're in certain areas like if you're a startup you hopefully have more and more users if you have sensor networks you or IOT you hopefully have more and more devices and so as your data volumes change you know if you have a fixed interval that could again make chunks too large so what we also support what we do support this is probably the most common use case although we allow you to manually change the intervals of the chunk over time that's the manual process it's just a system call but what we also do is we allow to perform adaptive interval and this is what I was talking before and so what this is effectively going to do is going to look at the range of an individual chunk and effectively it's almost like a hysteresis a high water mark a low water mark as it decides that the volume of data is increasing it's going to shrink the range of future time intervals and similarly what this means is that early data will not expand this chunk and will be placed in a proper chunk that you pre-generate out there so there's kind of this kind of gives you the best of both partitions are created with a fixed time interval but the system will dynamically adapt we'll see how much people actually use this in practice my guess is that most will actually prefer to maintain it manually I think people in generally are leery about letting systems auto-tune themselves but this is a long-term bet so essentially do a compaction or like are you going back and combining these things together? we don't currently go back and resize chunks it would be a relatively easy operation to do transactionally correct and we have that capability we just don't do it automatically okay so let me give some graphs so you saw this graph before where Postgres went off a cliff this is up to 250 million rows single row inserts time scale is maintained this really nice performance throughout it's paying a little extra cost upfront because it has just the more complex insert pipeline in a single table if however we take this to batch inserts which is again pretty common actually in many time series environments this is what we looked to about a billion rows Postgres quickly goes off a cliff they both in batch mode they both starting about 120,000 inserts per second we just azure gives azure servers are not as good as that companies I showed these before but what you see is out here you got this pretty significant performance gap on the insert side if you were a by the way if you were a from a DevOps perspective you often talk about metrics not rows these are 10 columns per row so this would be about 1.2 million metrics per second query performance table scans compared to Postgres are roughly similar or a little bit faster because of smaller indexes I said roughly because the one thing we do is we pay extra in planning cost so we'll probably add maybe an extra millisecond of planning cost compared to native Postgres but you know often make that up big in the execution time is that because you have an extra step just because I mean there's 50,000 chunks that you're dealing with so just doing constraint exclusion that's all written by us it's all in C it's pretty optimized you probably could do it more but you know it's just more stuff you've modified the Postgres optimizer to include now invoking your part yeah so what happens you get this query parse the query parse tree comes in we quickly detect that it's a hyper table in the case we grab it if it's not a hyper table we pass it through to the rest of the Postgres system in the optimizer we haven't we haven't changed that much of the query optimizer but the I mean the planner just has a lot more steps because it's dealing with inheritance on like all these thousands of tables and we have some code for constraint exclusion and all that stuff I think the biggest thing where you see is a lot of times when you have a you're dealing with small indexes but group buys and like time-ordered group buys you see this huge performance improvement right this is actually a very and remember I talked about deletions before being more efficient because again they're they're foul deletions not not not row deletions but time-order group buys so this is a common example this is a super common query that you see in many type of dashboard of dashboarding and time series you know it's saying you know do give me the average value over a minute you know group buy minute order by minute descending and give me the last ten values this is like what every dashboard is doing right the problem is you have an index on time this is an abstract function from the perspective of the the planner and optimizer it doesn't know how to handle this and so Postgres and I think a lot of databases wouldn't know how to handle this again because we have a notion of time we you know have a notion of how all these things do so we make sure to handle this kind of the way you want to we special case some of them but we also have more semantics on what if it perhaps if it's a time stamp we have semantics on what how time stamps are behaved if it's a integer in a time column that specified we kind of know certain semantics around how we inserted it do you maintain separate statistics on like pre-debuting minute buckets I don't think currently I not definitive though I don't think so now you can say what about no sequel this is Cassandra Cassandra just actually turns out not to be that fast this is I'm writing this now in metrics per second but you know this is a 10x performance gap when obviously Cassandra can't do all the queries the query model is very slow Cassandra is a column store but it turns out kind of common family it turns out it's not all that good at it maybe I said nothing about commercial versions of Cassandra time-ordered queries again much faster last point this is a very common query in IOT and in finance it's giving me the last reading for this distinct item so if you have a bunch of devices what's the last reading for this device or what's the last reading for this stock ticker again kind of much more performance so now if it is a Postgres audience maybe that's not a lot of this but the natural thing is but it hasn't Postgres 10 solved all this problem they have this awesome thing called declarative partitioning aren't all those issues that you talked about partitioning in Postgres those performance numbers were all from a single table of Postgres compared to timescale so it wasn't using a single table of partitioning itself so let's just look at it so in batch mode this is the X-axis is the number of partitions in the database I think this is again designed for these type of logical partitioning was designed for you have it per day you don't have this thing every five minutes with many dimensions on top so this is with batch we're seeing big performance if you are doing single row that's the kind of performance of timescale versus Postgres this is really just not optimized again these are all some of these are implementation issues but they are not at all optimized for having a large number of tables because again I think as a general purpose database as opposed to a time series database you're optimizing for kind of very different things now the other thing I want to point out this is true amongst a lot of these databases as well in that these often try to give great flexibility to the user kind of as the cost of ease of use and a lot of flexibility in a lot of scenarios so the way you do it in a lot of these things where you mainly partition or specify declarative partitioning is you kind of build your own tree so in Postgres 10 for example this is kind of the actual syntax you might take a table a main table they use inheritance and you create a bunch of partitions and let's say we want to they don't yet support hash partitioning but let's say we want to range partitioning over the alphabet over four things and then if you want to do again time partitioning you actually define these as children each of those so you have a data structure like this alternatively you could have turned it and then you have to manually build indexes on each of these things and then kind of this is all again kind of pretty manual you could have flipped it around you could have done time and devices first but this is a hierarchical notion of partitioning the problem is if I ever I talked before about wanting to let's say change your time intervals because of changing data volumes so this thing and similarly here you don't have flexibility of ever changing the number of device dimensions because you need again better locality so it kind of restricts you once you once you set it this is time scale syntax it's basically create a hyper table conditions this is the time dimension this is device dimension and it's basically going to create a flat structure and this allows us then over time to tune these in any way and it's going to not have any kind of ordering problems and similarly everything I talked about before indexes and all this kind of are pushed down automatically you don't build them manually on each of the children and if you look back at these these are some of the properties I said before PG-10 basically either doesn't do these or makes them really hard and so kind of this is really because it kind of focused again a general relational database is focusing on trying to flexibility in general cases while we're kind of tackling a particular use case and kind of optimizing for that so let me finish up with just a couple more slides you know I talked about this allows you join we do add some additional let's say features kind of type of analysis time bucketing different functions that you don't find in an old database we allow you to track people's GPS coordinates in case you want to follow them but in seriousness what we found in a common use case for example in IOT is what I'd call kind of geospatial temporal analysis this is the actual real user use case let's look at the amount of tonnage coming in and out of the port of Los Angeles each day we're going to use it to predict what the market is going to do a month from now and you know the nice thing about the way we built it because it's an extension on Postgres it plays friendly with other Postgres extensions so post GIS is this GIS extension on top of Postgres and you know they look they work together get JSON I talked about this and kind of end with and kind of stress last point because to do all this engineering to make that hyper table look like a standard table then kind of that entire ecosystem just work out of the box it's open source please download it play it play with it we have a very active Slack channel if you have any questions and want to ask ask for help otherwise as a great audience we are hiring aggressively in New York and Stockholm for core database engineers R&D support engineers and a bunch of other people like time between Stockholm and 7 hours it's a great place to live New York though is maybe even better any questions any more questions let's thank Mike time for a couple questions you mentioned that millisecond during planning but that depends on the number of chunks you end up having and you mentioned that the millisecond is referenced to 50,000 chunks or so would you say that this is a typical data set you see with people using this 50,000 is definitely on the high end is definitely on the high end so the question is how does I mentioned about a millisecond of planning time how does that scale with the amount of data planning time is really a function of the type of query right you know in general for your relation how complex you know is this a query inside a sub query with distinct things like signals and CTEs and I mean it's this is anything you write like you could have a page long weird auto-generated SQL code from your random BI logic from the 90s and it should run so some of those queries could look really weird yeah so we create chunks we create chunks transactionally and we don't one of the reasons they're distinct we don't change the metadata here effectively the important metadata are basically related to the constraints so related to like the time interval and other types of partitionings we both maintain our our own metadata that we use for an optimized path we actually also store those actually as constraints in Postgres sometimes double level of security but new tables are new intervals are created transactionally with a lock and and so forth yes so I'm curious when you were talking about that if we have and sorry they don't change once they're created yes so I'm curious when you were talking about the big trump function in Postgres we have a big trump thing it doesn't know what it is so it can really have time scale understanding so I guess my question is a little bit related to computation basically where does this happen I would like to tweak the planning of your query or you tweak the operator or you rewrite the query before you even plan it there's a couple the question was how does Postgres how does time scale know about day trunk there's a couple different things that were going on I believe for day trunk and we have a much more flexible version this is time bucket day trunk only lets you chunk by minute or hour or whatever we do special case some of the easiest things we special case some of those functions so that the basically planner has knowledge of what to do so that it could know that basically the output effectively we know that the thing is order preserving so that the input of this thing has a certain order in the index and the output of it will also maintain that order okay but before for optimization I think so I hard heart was a question do we have chunk level compression we have not yet implemented any native compression in the database what we have done what we found some of the users do is actually run this on top of ZFS and have gotten roughly about three to four x compression you mentioned you built indexes for each hyper table chunks so I think it makes sense for like new chunks because it's update heavy have you considered building global indexes for older chunks okay so there's two you asked we create indexes so by default I will say that when you create a hyper table you can turn this off but it will create some default indexes based on time we allow you to like any other like a lot of relational labels we allow you to add new indexes drop them you know B trees, gin, brin, gist whatever type of hash whatever type of indexes will support that we allow you to reindex and cluster after time the one thing we do not support is global indexes overall the data and so this is a kind of a key design point in that effectively the unique constraint if you define something as unique or a primary key we require that unique values have a prefix that is similar to the partitioning function of the database so for example if you partition by time we require that your primary key or anything declared unique has to have time as a prefix it could be additional it could be composite what that means is then we don't support global index and so again queries and lookups are this two state process you first time the thing and the index is just local so I have two questions, do you specialize UDS that you provide are they everything in C or PLPG SQL? both they are increasingly moving to C so anything performance driven we are doing in C and then the second question is like for our engineering question like are you following along like the CITUS guys as always upgrading Postgres 10 or are you fixed at some particular version in the past what is the strategy there so we are we are upgrading I mean CITUS was a fork initially they moved to an extension they were initially a fork which created that other problem we've always been an extension this week or early next week or Postgres 10 actually and so we are it's already on master you can check it out now but we are supporting upgrades because users support upgrades sorry we are providing support for currently 9, 6 and 10 and we will continue to support things, multiple versions moving forward the course up like the hyper tables that's all sort of separate from the rest of the system so that doesn't really change but anything that you have to interject in like the planner or stuff like that there is obviously there is obviously some messiness when you are looking to support multiple versions because your tie in points change in fact the biggest frustration about upgrading to PG-10 was they changed the formatting of the outputs of their print apps of the tables which meant that you had to like special case for a lot of your unit testing just messiness but yeah because users will upgrade between versions and you need to support both you mentioned sort of clear time to show comparisons like one thing that I wanted to ask is how fast are joins like in between two time series for example you have to do anything special for that currently you do not have to do anything special the question is how fast are joins between two hyper tables the answer is you do not need to do anything special it works it is not optimized basically localized joins when the chunks are overlapping so it is not it is slower because it is basically kind of pushing everything to memory one last question is there time yeah one more question the postgres books so it is first running interest in adopting your chunking ideas for non time series the question is has postgres thought of adopting our chunking ideas for non time series data the postgres community first of all has been super supportive actually we are kind of engaged with them closely we talk a lot about PG conferences and other things I do not think it actually makes sense for them to do and really because their goal is to focus on this flexibility and we are really special casing for our workloads and so we have a lot of code and a lot of changes we rewrote the whole insert path a bunch of query stuff it is just something that I do not think makes sense to push upstream into mainline there is a lot of disaster recovery tools that look at directly on the pages of the database and you have to rewrote all those if you want to switch over to format alright let's thank Mike again I want to thank everyone coming this semester we have a bunch of great talks and time series databases all the videos for these are on youtube so that is it for this semester we will do this again fall 2018 I do not know what the topic is at that point and if you want to know more about why these old systems these older disk based systems are slow and how to build an on-off the database system come take the advanced class in the spring or go work at times scale alright thanks guys thanks for helping thanks