 My name is David. I'm a developer advocate for a company called influx data Our primary product is influx DB and open source time series database Anyone familiar before I begin? Okay, a few hands All right, first we'll start with me. So I'm Scottish. I Have a load of animals in my house. I Really like esoteric programming languages Now this is the question I ask every time and I've not had a hand So I'm looking for someone here But has anyone ever heard of or written a line of pony? All right, that's your mission after this talk. Go look up pony lang very very cool Rust usually gets a few more hands still kind of esoteric, but a very cool language, too. I am quite Into the container ecosystem I'm a member of the kubernetes organization and I've been involved in a release team for the last year So if anyone wants to talk about kubernetes or containers at any point, please come and speak to me And I am a practice in stoic So if you're not familiar with stoicism, it's just the idea that being uncomfortable with being comfortable with uncomfortable situations like this and My animals, so I've got a small video. Does anyone know what a chinchilla is? So I have two daggers five chinchillas free Roman ferret a dog some fish and some other stuff But aren't they just the cutest animals in the world? No, it's stuck. There we go. Okay, so the talk introduction to time series Before we begin I'm going to start with a small pop quiz When we talk about time series, we generally think about monitoring and IOT we talk about financial trading, etc But there's a whole bunch of fundamental concepts of time series that are actually much much older So we're going to play the invented when game and I'm just going to play with myself I'm not going to make anyone participate that doesn't want to participate but as developers or operators or anything like that We have to deal with the concept of encoding Probably on a daily basis whether that be json encoding or other formats But when do we think encoding was first used in modern civilization? The surprising answer might be that actually it goes back to 410 BC So not that new The encoding system back then was documented in Plutarch and it's just ancient Rome kind of I guess Studies diary mixture of both And what they said in this book was that there was a mercenary Someone who had a boat with a crew and the ability to fight Called al-Qa'bedis who didn't fight for any one particular side in any war What he used to do is show up to the battle Kind of scope out who's who and then raise a flag to see which one he was going to kill That was his entire premise and I'm sure he made a lot of money during that Now maybe it's a bit of a stretch to consider this encoding But he used a very simple flag based system to indicate to a much greater audience What side of that battle he was going to participate on? Now we're talking about the 14th century. So think about that as 1500 years at minimum And the system hadn't really evolved much in that time So they're now on a system with one flag or two flags 1500 years to decide we could add another flag But then shortly after that just another hundred years things started to get a little bit more sophisticated They actually had 15 flags to represent different letters or Actions that they were going to partake in the war and then just a couple of hundred years later in the 17th century This system is actually still used today by most Navy ships So we have a system that represents numbers and some letters with flags and they have different colors and that's it All right, so in coding not new very very old and has been used across the last 2000 years So they ask you when sharding is Is anyone brave enough to think it's modern hopefully not sharding is also very old But when we talk about it in a technology sense We may think of sharding as this new thing that people didn't use before But it actually dates back to 150 BC Wasn't documented until many years later, but the concept of sharding may not be what we think of today What the Romans did Was take the Roman alphabet, split it across five tablets with five letters on each tablet And the reason they did this is because the way you won any war 2000 years ago was how fast you could communicate with your troops in different posts Been able to use tactics to your advantage to attack your you know your anyway So I actually find a 2000 year old photo This is how they did the encoding and they used a flame system or smoke system So on the left wall, they would have five people all with the flame torch And they would indicate which tablet the message was going to come from if they raised the four smoke flames They would know fourth tablet on the right hand side They would do the same to indicate which letter on the tablet and they were able to transmit messages really really far and really really fast And that was the biggest advantage of the Roman Empire had when it came to fighting their enemies If like me you appreciate learning about these kind of historical Points and how technology maybe isn't as new as we like to believe There's a really good book called early history of data networks Most of the examples I spoke about here are in this and there are so much more So if you enjoyed that by this book read it, you'll have a great time Okay, now the time series that has to be new right? Can't be old time series must be new Unfortunately the time series is also really old the Romans had a concept of time series in fact The Romans used to have IPOs The Romans used to have stock markets the Romans used to trade shares and companies with their peers People were tracking the price and profitability and value of all of these companies Going to the Roman markets and trading the shares and all of their companies making some extra money And I find that fascinating right nothing we do is do the Romans did everything 2000 years ago But we can learn a lot of lessons from them So again many many many years later the Netherlands were the first to have the first IPO that we have on record I was in 1602 the US were lagging behind of course And they had the first IPO then now I think what's important here about these two events is that the concept of time series as a phrase has never been used in any public writing ever at this point in fact There was an 1884 that someone actually coined the term time series And what they were interested in was if we import silk and cotton and other textiles from around the world Does that have any correlation with the export price that we are getting on our wheat and the answer was yes I would not suggest reading that article and it hasn't aged. Well, it's very dry, but you know, it's there. It's available But as a term time series isn't that old 1884? Okay, so that's the date that none of us were alive, but not too long ago Why is this important? Well, it was the first paper that ever actually used the dimension of time and any statistical mathematics or analysis of numbers And that's what we're going to talk about today Okay, so my CTO Hates it when I put a slide in But on my first week of the job I spoke with him and he told me that most data is best understood on a dimension of time And that has stuck with me for the last 15 months And it really opens your eyes when you think about that Especially when you go back to your jobs on Monday and you're starting to look at the amount of data that you have Start to think about it's a dimension of time and how that applies to your data and the whole world of possibilities are going to open up So now we're actually going to start the talk. That's the history bit done What are we going to cover we're going to talk about what time series data is I Am going to talk to you a little bit about TSDB's or why you should use a time series database instead of a general purpose database I will use some examples with nflux DB because that should pays me and that's the product that I'm most familiar with What I will tell you is that everything I cover is very much agnostic and we'll work with Prometheus or any other TSDB that you may be using I will talk a little bit about the value and not just the value but the cost of time series data There's a lot to understand and mostly because generally you have a lot of time series data And you have to understand the trade-off of storing it all or storing some And then I'll talk a little bit about as developers and operators where we have moved in the last 10 years And why time series data is so important to understand that migration from monolithic to cloud native to microservices and servers Okay, so what is time series data? Nice and simple any piece of data with a timestamp is time series data That's it. There's nothing fancy here If you have something in a file or written to a database that has some sort of created that last updated that Any of that that is considered time series data So I'm going to kind of walk you through an example to understand why we need the dimension of time when we talk about data And what I have here is just random scenarios that everyone here should hopefully be familiar with inside of our infrastructure So in the light red color, we have the memory hitting 100% That would be a cause for concern, right? If we see the memory usage at 100% something may be wrong Next we have a health check failing if there is a health check failing and our Production infrastructure something is wrong. And of course if we see a pod killed by the memory That is a bad thing. We don't want pods to be dying or processes. Whatever you want to think of there And then in the orangey yellow color what we have are potential causal events These are not dangerous on their own, but they could be a trigger for the dangerous stuff So if we have a database migration that runs that could cause problems that could break our application If a pod gets restarted in Kubernetes, that could lead to potential problems too. If we deploy a new version of our app I've spoken to developers. We are likely to introduce bugs new versions equal new problems Ever see I pass the start ease may not necessarily be a causal event But could trigger something else in our infrastructure in the purple We have things that should have absolutely no correlation whatsoever to any of the dangerous stuff in our infrastructure So if you do a git commit or the CPU is hovering around a 12% mark that should not give you any cause for concern and The red herring and pink Scotland qualifying for the World Cup. I've lost all hope of that ever happening Now it's really difficult From here to understand what happened in the system But when we apply the dimension of time things actually begin to make sense What we see here is the memory hit a hundred percent that process was killed by the arm We triggered a new deployment because of that now if you're in container land that could be used in the latest tag All right, very dangerous. You're always going to pull stuff so you don't necessarily know what's running in production When that happened it caused a database migration to run Was broke all of our tables because we weren't supposed to release it yet Our health check is now failing and now our application is in some sort of crash loop back off state So understanding what happened when it happened and the order that happened is how we make sense of complex situations Everyone familiar with this screen Right, no matter what you use for storing logs logs or classical time series data What is the first thing we put at the beginning of every log line? timestamp These are events Now I'm not going to talk a lot about structure and events But if you're not right in your event or log data and Jason that can be parsable by a tsdb or other system You're losing out in a lot of intrinsic value And there'll be some examples of that later But if you want to talk about structured log and just come and find me after this talk So what is time series data? Well, we know there's anything with a timestamp, but there are actually two classifications of time series data as well So what we have first is regular time series data, and I'm going to try and refer to this as metrics for the rest of this talk What makes it a metric is that it is predictably available. So regular time series should always be available on the same interval Examples of that would be the CPU usage and so forth Regular time series is not predictable. I cannot tell you when that next event will happen and we're going to call that events It's unpredictable and inconsistent So an example of regular metrics or regular time series and metrics CPU usage Memory usage, so we're talking about Linux system and for such a monitoring here We could be tracking the ping time or latency to some sort of external service and we do that every 10 seconds That is predictable and if we want to track the number of processes running on any machine again If we request that value every 10 seconds, it will always be available Irregular time series of things that we cannot predict Can you tell me when the next login event will happen in your system? No, you do not know when the next user is going to get their password incorrect Nor when the CI is going to finish or if someone is going to trip over a network cable and your data center and take down a few systems These are things that are all valuably important and should be stored somewhere But you can never predict it and the way that we handle these two different types of series time series data has to be different Okay, so we have a football game There's time series data everywhere and for a start there's time series data in this room As a metric any interval I can ask how many people are in this room and I will get a number from zero to whatever So in a football game, we have the exact same Now there are a few different types of time series data here first. We have the number of people in the stadium So that would be a metric and regular We also have goals scored So they have a timestamp. We know that at 54 minutes and 7 minutes in this game. There is a goal scored We know who did it that is an event and it's irregular We cannot predict when the next goal will come if we could we would all be a lot richer We also have aggregate and that's really important in time series data as well So we are actually able to calculate an aggregate score over multiple games We're not going to talk about aggregates too much today, but there are special form of metric And in fact, there is Some more event based data here If we take the number of people in the audience That is an aggregation of a set of events How many people came into the stadium? How many people left the stadium? If you track every one person in and one person out you have an aggregate or a metric So you have to understand the difference between metrics and events And what's important is that all metrics are aggregations Everything Even the CPU load on your machine is an aggregation We just don't find a lot of value in storing every CPU instruction sent every instruction sent to the CPU It would be very costly because it would be billions and billions, but all metrics are aggregations So how do we collect time series data? I'm going to talk about a couple of ways And just from nflux db and infrastructure monitoring point of view Of course, you can have client libraries for php to write to your tsdb of choice But a lot of this work has already been done and is already available to you to be consumed So there is a project from nflux data called telegraph It has a whole bunch of inputs and outputs. The reason I'm listing the outputs is that we are a very open company We don't really care or mind if you want to use telegraph to write to another tsdb We don't lock you into nflux db We support inputs like cloud watch, elastic search, kafka, jenkins, kubernetes Anything you have running in your infrastructure is emitting time series data And the tooling to collect that is there and a couple of lines of toml So you don't even need to write anything complicated to start restoring this time series If you come from a Prometheus background, then there's a whole bunch of exporters that are very similar to the telegraph plugins as well Basically, no matter what you run in your infrastructure, there's something there to collect that data So you should begin to leverage it Anyone who has already been doing a little bit of time series research, there's a bit of a dogmatic war Should I pull or should I push my time series data? You know Prometheus goes down the essence of you should always pull nflux db actually supports both And the reason for that is you really do need to support both Right for metrics. Yes, you can pull that on a regular predictable interval The value will always be there and you're going to have a really good time You're going to collect loads of data and learn loads of insights For the For the Events and unpredictable interval stuff. You cannot pull it. You don't know when it's going to happen For that we do need a push approach. So you're going to have to instrument your code to push those events to a tsdp So there's not one right way. You have to kind of adopt both Use cases for time series data As developers you may be very familiar with monitoring your application Of course the Linux machines that run on any cloud VMs and so forth Iot masses amount of time series data there, you know, if you have a digital firm stat in your house It's constantly checking the temperature and emitting that it's constantly tweaking the Water pressure and the boiler etc And of course real-time analytics if you have a website online and people browse to it You maybe want to track them when they go there How long they stay there when they leave where they go to where they came from all time series data So time series data literally is everywhere in our stack already And you just have to identify Now the database bit. So why should I use a tsdp over any other system? Well, the way that time series database stores data is in a very particular format, right? The way that we read the data is very very unique as well time series data is not small data Generally, we're talking millions to billions of points on a regular frequency That means that we have to have high velocity writes Our reads are very different and unique, right? You would never go to a tsdp and say give me every value for this series across all time That would be really painful Generally 99% of the time you're going to go I want all the values for this series within a specific window So the way the sharding has to happen the way the indexes are built is all built around time And because there's a very heavy cost To storing time series data just because of the pure volume and size We have to have time to live or some sort of life cycle management for that data to either expire it Round it up or down some button so forth and we will talk about that in a minute So this graph comes from dbinjins.com And they track a whole wealth of statistics about twitter mentions google searches active repositories on github all this other data And they try to track the growth curves of each individual database but also databases as a category The blue line that's storming away at the top here is time series It's the fastest growing category of database for the last two years maybe a little bit longer And I think the reason this is the fastest growing database is we're now starting to change our architectures I won't make you show your hands But there are going to be a number of people in this room who have a monolithic application That are talking about containerization, micro services and cloud native And what we do when we make that architectural switch is that we move a lot of complexity that we used to keep in our code to the infrastructure layer And it's that complications in the infrastructure layer that have led people to make their monitoring a lot more advanced So the new stack did the study and they asked people or developers You know, do you have time series data and if you do do you store it in a time series database? 12% said yes So 88% of the people that had time series data weren't using a general purpose database instead of a time series database Now I think that's mildly confusing And I don't think it's entirely accurate And what I want to show you besides just using a really cool rick and morty image Is that you probably already are using the time series data and just wear and aware So if your company has too much money You may be using urelic If you have like that's a little bit less money You may be using datadunk And probably everyone was using google analytics These are all time series databases just with a different ui showing you different stuff and metrics and measures about your applications or your websites Excuse me, so you probably already are using the tstb So I think that that study was just a little bit flawed Nice question. Who's using kubernetes? Okay, only a few hands. I sent a poll on twitter I think it was around march last year And I said I run kubernetes in production and I monitor width There's 74 of the people that responded said they were using prometheus Right, and that's great. At least they have some sort of metrics and monitoring in place And because prometheus is a cncf project that number is always going to be higher than any other tstb We had 10% using some sort of sass like urelic or datablog We had 3% using nflux db And then 13% that are using nothing Now I don't know what your level of knowledge of kubernetes is But I did see production And 13% of those people are not monitoring it and it's a big scary system So if you are in that 13% right what you learned today, hopefully you can take it away Run nflux db or prometheus or whatever and begin to monitor that All right, so it's not too late. This is where we start talking about time series with nflux db as a few examples Remember it's agnostic. You don't need to use nflux db But it's just the one I'm more comfortable with Okay, so nflux db is a time series database What's really cool about the company I work for nflux data is everything is open source And we have a very very small enterprise Code base just to make money to keep us ticking over But we are a full stack Time series company what that means is we have telegraph for collecting your metrics We give you chronograph for visualizing that for dashboarding and other tools You can use grafana if you're already comfortable with that it works right out of the box We also provide capacitor which allows you to do real-time streaming and anomaly detection on top of your time series data We are currently in a process Of working on v2 of nflux db is on beta 4 as of last week And we'll be hitting g a within the next three months And nflux db2 is is built around containerization and kubernetes and all that other cool stuff So if you are doing that migration you may get a lot of value from playing with that That's the sales pitch bit Let me talk about time series data. We talk about points the best way to think about that is that at some point in time This equal this Now if we use an example Hopefully we're all relatively familiar with linux, but it has this concept of load averages I have a small series key on the left here Excuse me the series key is made up of a measurement name. So here it's called load We then have a couple of tag values. So we're saying the host that this load was collected from it's called vm1 In the blue we have the fields So that's just the one minute the five minute and the 15 minute load average And then in the yellow we have the timestamp on the end because all time series data needs a timestamp A couple of examples here. We could have the stock tickers. So the measurement name is stock price in yellow The series key is the market and the ticker So the market we need to know which market the data came from and the ticker is the company we're tracking And there would be some value on the end And of course you can use that format for anything within your infrastructure So if you are microservice based you could just track users which service they're hitting with their api calls and so forth Just to cement this when we talk about a series what we're saying is that the tag set is the same So even though the market is nasdaq on both of these these are different series because the ticker is different So if I wanted to pull out all of google's pricing for the last 24 hours, that would be a series of points on a series Now why do you need to understand the difference between tags and fields? Well tags are always indexed which means they can only ever be strings That just means that when I want to pull out one very specific series from the tstb is really really fast But we do want to store multiple types of data across that series So you can use fields which are not indexed, but they do support multiple data types So they can also be strings. They could be integers. They could be billions and so forth But they are not indexed and doing aggregations across them can be really slow So generally you always work with a series and you filter by tag So let's talk about this value and cost about time series data And in order to understand how costly time series data is we have to understand one of the most important concepts Inside of time series and that is resolution So when I spoke about metrics earlier, I spoke about predictable intervals. We call that the resolution of the data So if I'm going to collect a load average every 10 seconds, that is a 10 second resolution If I collect a load average of over 30 seconds or one minute, that would be a 30 second or one minute resolution And the more sophisticated we get with that resolution is going to determine how expensive it is to store that data So the value of time series data is directly correlated with the resolution and how many points of data we have You can actually reuse a formula and say 10 second resolution across 10 metrics Multiply them together and I have some sort of storage requirements of how many points I want to store and you can calculate that But it's more important to understand this through an example. I feel so Let's assume we're doing Linux infrastructure monitoring. I'm going to keep this really really simple to start with We're only going to track one value So if I have one machine one series I have one value that's the CPU at one second resolution That means I'm going to store 86,400 points per day inside of my database Now in order for this example to really strike home with you what I would suggest is think about storing this number of rows in a general purpose database as the number increases If we have two machines in our infrastructure still one single measurement at one second resolution We have 172,800 points per day A modest production infrastructure may have 10 machines, which is 10 series And you're going to want to track more than the CPU CPU memory disk. I owe a couple other fields So we're going to have five measurements We're still going to track it at one second one second resolution And all of a sudden we're jumping up to four million points per day Again, think about this in the concept of a general purpose database. Are you comfortable storing four million points a day? It depends on the database, but maybe not Now let's jump to a real example of time series data. We have financial trading On the NASDAQ there are 3,300 companies Let's just assume we're only tracking the cost of the ticker and because it's financial trading. We may have sub second resolution Here we're going to use one millisecond Realistically when I speak to financial companies, they are using nanosecond resolution on the trading data so I just couldn't fit that number on the screen But it's a really big number per day Right very very very difficult for anyone to store that amount of data in a single database But time series databases are built to do this and they will handle it much better So what happens if we take the one millisecond resolution and drop it to one minute? Well, we're back down to the millions, which is good What if we drop it to an hour and we're in 79 000 points, right? We're now in the comfortable territory regardless of what database we use We are feeling a little bit more comfortable about storing that on a daily basis And if we jump to a six hour resolution, we're at 13 000 points You could store that in a json file on the disk and load it every time and you would still be happy with that So six hour resolution. We're very very comfortable now. This is good Here's my wonderful draft I do this myself. It's not my kid But what's important to understand here We have the dimension of time And we have the value of the data And what happens is if I have one millisecond or in fact, I have 10 seconds resolution That data is valuable to a certain plane And then that level of granularity or resolution no longer is that important to me anymore You know, once my time series data is what a day or a week old Do I really need that level of resolution? No for real-time observability or debugability. Definitely So then we can maybe move it and we go to one hour resolution So the purple line is the The resolution being changed. The green one is just the value without ever changing the resolution After an hour, we're still getting more value. And what we really want to achieve is this top purple line We do want to change the resolution and we want to maintain as much value as possible But the longer we store the data when it's not important to us anymore. We have to down sample it or delete it And that's what it's called and time series data. You have to down sample You may find some systems talk about rolling up the data Calculating averages is all the same thing. So down sampling data with time series is one of the most important things that you can do Especially as you move from the thousands to the millions of points Influx DB makes this really really easy This is called a continuous query It runs on a regular interval. It pulls out data from a predictable window Calculates the average and then stores it in a different retention policy for longer So what most people use it in flux DB do is collect data at high resolution say one second And they'll keep that for one day Anything older than a day they calculate the average For the day and store it for a week or a month Anything older than a month they calculate the average for a week and store that for a year So you have that nice value curve going down slowly And you're reducing the points the same as we did with the NASDAQ You know we went from the 200 billion points per day to 18,000 And it's as simple as a five line continuous query like that So it's really important It's the one thing that I always say to people if you leave this talk with nothing but this That's good right roll-ups down sampling really really important Unfortunately for events in a regular time series you cannot calculate the average All right, there's no predictability there the average would tell us nothing So there's an entire wealth of information on doing anomaly and outlier detection You can google for that with influx DB. There are plenty of docs I definitely do not have time to go into this today But there are approaches to do it. So if you want to store your logs Inside of influx DB you can and there's a really good way to sample that over time Okay, so now that we have all of this time series data What does that mean for the monitoring of our applications on our infrastructure? What can we do with it? Let's talk about the most simplest architecture in the world. We have one application That speaks to one database This is the monolithic architecture If we want to monitor this system And I'm only going to speak for my own experience here. I'm not saying this is necessarily what everyone does But in the early 2000s what we would do is set up check-based monitoring Check-based monitoring says if the CPU ever goes above 80 percent, I'm going to reboot the machine If the memory usage ever goes above 80 percent, I'm going to reboot the machine If the response time of my application hits above 300 milliseconds, I'm going to reboot the machine Black friday, I used to work in e-commerce stores We would go out we would buy a whole bunch of new machines We'd stick them on our racks and then we'd sell them two weeks later Right that's how we handled scale back in those days This was just really simple right when you have one application speaking to one database Life is easy kind of so in this system When do we send a message to our devops or sre our operations people? And generally we just have a health check on our application and if that begins to fail We page someone and they have to go look at the logs and work out why that's failing Again, things were a lot simpler in the early 2000s And if you're lucky enough to still be on this architecture today, I'm very envious But then of course we started to do horizontal scaling We'd have more than one application still speaking to one database and it got a little bit trickier How do we send a page when we're doing horizontal scalability does A health check failing on one of our nodes mean we should page someone No What we actually used to do was just kind of work out how many 500s we had and if it was a certain quantile Eventually we would page someone and they would have to go and look and see what was wrong So things were still okay at this point Fast forward to the modern day And this is generally what we're working with So we have microservices now We have independently scalable services service b has two instances here. We've got service one Service c has a canary deployment because we wanted to do traffic shaping and send 10 percent of our traffic to one's individual new version Of course, all our networks are virtualized now Whether we're in the cloud or using kubernetes c and i's everything goes through some sort of software defined networking there Each microservice generally speaks to its own database And those databases vary in different types And of course we've got the service mesh because everybody really needs a service mess now So we're proxying all of our traffic through that as well How do we even begin to understand if this system is healthy? So there is a cloud native convenience versus cost argument I guess it's really difficult when you make this migration to microservices to continue doing check-based monitoring In fact, you can't there is no way to tell the system is healthy Using check-based monitoring and i'm talking about systems like nagios and the singa right we have to start using tsdbs That's why the db engines graph is going up at the fastest going rate because many many companies and teams are adopting this So we can no longer treat the symptoms We really need to understand the system as a whole and determine how to fix it So we have to upgrade our monitoring And what we're talking about is causality How can I find the root cause and fix it? And not just that but how can I do that quickly and efficiently without major impact to my users or customers? And if I had an answer for that that would be great, wouldn't it? So when we use a time series database We have access to weeks months or years of data at different resolution So we can leverage that for statistic modeling to understand our systems We have tags that are indexed on all of our time series data Not only do we understand the type of user we have what services they're hitting What region they're coming from all of this is tag-based data that can really apply to the time series or the events And we have a wealth of statistical methods available to us in prometheus and influx and datadog and urelic Now that's just a list of six. There are dozens of statistical methods You do not need to know how they work And you just call the function you pass in some data and it's going to give you some sort of distribution Or graph or chart that allows you to understand your system So I've got a couple of examples In a previous life, I was an sre I wasn't a good one But I was an sre And my biggest complaint about that job Was being paged at 4am With a disc alert Hopefully a few people have had this you get woken up at 4am because the disk is just across 90 capacity Now because I was a terrible sre I would fumble through to my office get my laptop SSH into the machine and type rm minus rf And delete as much as humanly possible from that machine so that I could go back to bed That meant I deleted far temp. It meant I deleted far cache. It meant I deleted Var log because I'm never going to need to log there anymore But it worked My machine would be back to 30% disk utilization. I go back to bed mental note. I'll fix that tomorrow. I never fixed it tomorrow And that just repeated This happened a lot Now what if we had this as time series data? What if I wasn't doing check-based monitoring? Well, I could actually track the growth on that disk over time Which means I can use linear prediction algorithms that are all provided in every tsdb To predict ahead of time Hopefully at 1 or 2 p.m. In the afternoon when I've got my good brain on And say This is a really highly probable chance. This disk is going to fill up at 4am. I'm going to page you now And that's what we need. This is the value in time series data using that Monitoring of growth or regression tracking it over time and sending alerts to actually improve your infrastructure and your job And of course if we use Loads of tags that show which process is right into the disk at what speeds we can actually track down the problem Demand is problem process the problem application And work out why it's producing so many logs The next thing I want to talk about is as we adopt microservices How do we understand if our users are having a happy time on our website? And histograms are really really important The problem with histograms is There are two different variations now Some of you may be very familiar with this And this is how we instrument our codes at present using tools like permeatious and data log and a few other ones We define these buckets and our applications So the request comes in We say oh, we're about to respond to this the time since the request came in was 311 milliseconds We stick it into a bucket in our application and send that to our tsdb The problem with this approach is we're pre-aggregating that event-based data As my application always going to respond to one of those buckets Are there any scenarios where that can change? How do I make that dynamic? If you want to change this you have to deploy And then you lose all previous data Well, at least your previous data will have the old buckets Your application has had massive speed improvements And now everything is in the first bucket. There's no value there. There's no distribution So we cannot use histograms in an effective method or modern architectures with pre-aggregated buckets So that's your warning What we really want to be able to do is do the histogram at query time Pass it in a time series of data using dynamic buckets to understand We're looking at 24 hours of data Calculate some means do query time bucketing But not only that we then want to forward all of that data to something called mode What does mode allow us to do? Well, it allows us to generate distributions from highly available tag sets within that data So if we talk about that in a more practical example If we have a percentage of our users That are experienced in one second response times. That's bad If we grab that data and zoom in to it pick out that bucket pass it through the mode It's going to analyze all of the tag sets for that data and tell us which tags are most frequently used And it may be that for our premium customers We inject an extra set of scripts or javascript that does some sort of extra tracking Which is delaying some sort of response or low time on the client side of the browser And just by using the modality function we can identify that really really quickly and at query time And finally because I have a bit of an e-commerce background One of the toughest things we had was Under how many servers we were going to need to handle events like black friday or christmas and so forth And it's still a really difficult problem. I also used to work for a media company That dead rock and metal news And the biggest rock and metal news in the last 20 years was probably the death of lemmy And we could not scale to that But fortunately well, no, not unfortunately if we were using a time series database We could have ran scenarios through something called whole winters Predicting major events and trying to have it taken a series of data from like and like events And applying some sort of scale to it to work out how many servers or machines or vms I would need to handle certain failure scenarios So whole welters is again available in most tstbs You just pass in data and look for cyclic data or things that repeat Very very handy and very very convenient And the whole point of this is that once we have time series data allows us to build our automation Determine root cause analysis use historical data. And that's just The start right we can then start to do prediction and machine learning on that stream of data as well There are a wealth of tools that who can influx db Prometheus through Kafka that run machine learning models on your time series data to do security threat Detection anomalies and so forth Very very cool So in summary, please use a tstb Do not use a general purpose database of your time series data Please roll up and down sample your data as much as possible. All right. It's very expensive to score billions Very expensive to serve billions of points So you have to understand how long it's able to you and plan your down sampling appropriately You can run outlier an event based detection run outlier an anomaly detection on your events And really just build as much tooling dashboarding and automation as possible with this data There's no point in storing it for it there not to be consumed and used to your advantage So that's all I've got. Thank you very much And I hope you're interested in the time series data, but a little bit