 Good morning Everybody's still kind of bad. Okay, we'll try not to be too boring. Let's see next generation of data stores What queries cold storage? Let's see if we all had the same thing in mind what happens in this talk But we see so for my background. I work at elastic the company behind elastic search We are investing heavily into this world as well. So that's kind of like my my background to that My general question is like who uses data stores and I assume everybody Who raised their hand now and then the next question is like who likes managing data stores? And that's normally when all the hands goes down or you make this gesture, right? And then everybody's like hmm. I don't know like I I'm not sure why I want to manage data stores and the question is kind of like Why is it such a pain to manage data stores or where is where is the problem coming from or yes? We all need them but nobody wants to run them, right? And maybe it's the old thing where you say oh, it's an ops problem now and good luck And we just pick the data stores and you have to run that but why are data stores such a painful thing to run or where is that problem coming from and I think in part is because of the I call it the classic architecture that we have all the use when building something and a lot of my examples will be Kind of tied to elastic search because that's the system that I know really well But this will apply to pretty much any data store system that you have So let's assume we have a distributed data store. We have three notes here And we have some data that we have split in shards So we have one let's keep it simple We say that one table or index that we split into multiple shards. So I have three primary shards Shot zero one two and then I have replication of those shards like this is replication of shards zero one and two and these are Distributed and it's a distributed system that works as expected, right? So you have the data if one of the note goes down You have all the data replicated on another note Which could be automatically promoted and in theory this is all nice and works great and it should make operation easy But it kind of doesn't right and why why is it often such a problem to manage the data? I think in part because depending on how the system works in the case of elastic search, for example There is one note that we for historic reasons called a master node That basically manages the cluster state and it knows like which shards are located where and what is the structure of the Shard and what is kind of like that the metadata in state of this this note and like this is a component that is often I Don't want to say brittle But it's like if something happens to that it doesn't matter that you have all the data nicely sitting somewhere You kind of like it's the map to the data. So if something with that happens Your cluster will have a very bad day and yes, this is also like a thing that is Like shared across the nodes and if that one dies another node would promote itself to be the master node But it's still like there are a lot of moving parts in that and if communication Break somewhere like I assume everybody knows the cap theorem consistency availability and petition tolerance if the network breaks in theory it all works, but in practice sometimes you get stuck in a weird state and All of this makes Operating data stores kind of hard for extending storage and just managing that storage and managing all the nodes in conjunction so You don't really want to have that or it's like we are very used to that and I always call it the Stockholm syndrome that you get so used to something that you say This is the only way and the right way to do something, but maybe it's not actually what you want to do And then there is the the great promise. I want to say of our industry, which is called serverless and Yeah, there is the old joke. There are servicing serverless. Yes The point is more that the service are managed by somebody else and I'll build up to what that means for data stores over the next few slides I assume pretty much everybody is familiar with the term serverless or If you want to explain that to somebody I always like this example where you say we have pizza as a service because Not here everybody likes pizza and then you have these multiple options. So you could have Blue is for self-managed and green is managed by somebody else And then you have this thing where you have like the homemade pizza where you buy everything from the Flower to the toppings and whatever and you make the entire pizza at home yourself from scratch And that is pretty much like you have your infrastructure Or you have your own data center where you manage the hardware the software and everything around it and then You might have like the taken spade And I know that the Italians will not be very happy about that That but the rest of the world often eats frozen pizza and then warms it up at home That's kind of like the next step Then you would have the delivery that you get the pizza delivered to your home or you go to the restaurants That's kind of like that the serverless model almost you have outsourced all the work You just show up you eat and you go and for for computing this would be This is kind of like your your own data center that you manage This might be an an easy to instance or a virtual server that where somebody else manages to hardware But you do the software and the operating system and everything yourself This might be maybe Kubernetes where you have abstracted everything away behind an API But you still need to manage a lot of things yourself And that would then be so less where you basically throw your code and something and it will scale it out And run it and do everything for you and you don't care about infrastructure anymore so that's kind of like the on-prem all the way up to the serverless setup based on pizza, but I think the comparison holds quite quite well and Then is serverless and then the other thing that often comes up is stateless and just as with service Serverless there are servers with stateless and data stores There is state because some people are very amused like how can you have a status database? Are you writing your data to death null and it will just disappear because maybe that's a stateless database But that's not really the point of that you want to have a data store where again somebody else manages The state for you and then the question is what is the the storage standard of our industry today and has been for a couple of years any guesses Sorry So I would say it's as three That kind of like that is where everybody puts their data and everybody has even though it's a proprietary standard Everybody has implemented a an API that is kind of similar to that So I I don't want to be able to specific here because you have thank you of choice It's like Google has something similar as your Digital ocean you can run your own with Minio and there are tons of compatible API's So I think a three is kind of like the storage standard of our industry even though we might not want to admit that especially when you come from an open-source world or It's kind of like painful that s3 this proprietary Amazon thing has kind of won But it seems to be what as an industry we have more or less standardized on and this is also where lots of players Who want to say like we are stateless. This is where your state often goes There's one small side note because we have had a lot of pain on S3 compatible is that what everybody does might be or People might have very different opinions what s3 compatible might mean So we've in an inter elastic search We have built that and three repo analysis tool that basically checks if your implementation of s3 is Compatible enough for us to work because we use that for for backups for a long time But even then we want to make sure that you support all the API calls that the network is fast enough that you can store large enough files like what you would expect because depending on the vendor specific implementation S3 compatible is one of the bigger lies in our industry that everybody says yeah We support some API's but the final details are then very different But looking at I'll pick some gcp examples, but again this will translate to pretty much any cloud provider or any environment In most of the files you have to choice between different data stores and things So the object store that is the three compatible thing where you can have a lot of block data and store those you could instead have the block storage which where you basically attach some storage over the network and Use that there's also the file storage, which is normally something NFS compatible or so For most cases, this is the classic approach How many people have been running data stores in the cloud that you have some network attached or maybe even local storage To the instance And the local data is more ephemeral and then you have persistent disk that you can mount over the network But it also has some trade-offs That you for the most part you can only attach one for writing Maybe you can attach that image to multiple instances for reading, but only one can write to that and NFS At least we have had pretty bad Experiences around performance When we tried to even just run 90 benchmarks if it is compatible or works and we kind of gave up on that So for most data stores, I think NFS is not a great choice Local attached storage is kind of like the classic choice But where everybody wants to go is block storage just because you have this outsourcing of state This is kind of like what most people understand if they say like we have a stateless data store It doesn't mean there is no state But you outsource the state to a block store for the most part just because management and also cost is normally very good But one of the trade-offs there so your ability is normally very good that You put the data there and then depending on your cloud provider You will have I don't know four or five whatever in lines in terms of your ability that your data will still be there Which might also mean that you don't need to do replication anymore Because the data is guaranteed and Amazon for example replicates the data internancy So you don't need to care about replication yourself anymore Latency is often a bit worse That's a trade-off we can then discuss how much sense it makes Costs I think Depends if you have a good access pattern and you write large enough chunks of data and you don't constantly transfer them It's potentially very cheap to run if you have a bad access pattern Especially around network traffic that can get quite expensive across zones Then it might also be more expensive, but if you do it well, it can be very cheap in comparison on on latency There's for example Very nice blog post a while ago where they describe one of their homegrown data stores That's not publicly available, but you can split into what data dog is doing And it's called husky and they basically said like that trade-off with S3 was the I think both the cost but also the the management and then the tail latency was Good and better even though the average latency was a bit higher But what they wanted to do it was more than good enough to have that slightly higher average latency And this made a lot more sense to run their data store And I feel like a lot of people trying to run some system in the cloud come to a similar conclusion That while there is a trade-off and the average latency is a bit higher if you have the right access patterns Then a blob store can perform very well and figure out or for depending on the task Be a good solution the other thing that you might want to do then is and from the classical perspective that was normally another thing That you split up storage and compute that you have some instances that can scale up if you have more ingestion happening So you need more processing of data coming in But you just write it out to the blob store and then when you don't have any incoming traffic anymore you can Basically shut down that index or ingestion layer and because you don't need to run that anymore You just put all the data to S3. That's where it sits and then you can scale the ingestion independently But you can also scale that the retrieving reading searching, whatever you call it layer independently So if for example, you have almost no or no reads on the weekend, you can mostly reduce your reading engines Whereas if Monday morning hits you can automatically or based on requests scale it up automatically and just save costs again Because your data the state is put to S3 where it is living But you can very easily scale up and down both the writing of the data and the reading afterwards as well to scale independently and go faster, but also reduce the cost down to that The other important thing here is normally that you want to have for a truly serverless or stateless system You want to have scale to zero so you can even say like over the weekend nothing happens So we just shut all the computing systems down. We potentially save a lot of money and energy You can still do local caching so that the different disk formats and excess patterns You can still exploit kind of like the locality of data that if you have like some hot reading path And you always read the same data that you cash that locally So you get kind of like the advantage of having local disks for that But a lot of the data can just be on the top store And you don't have to move it around all the time and then you have like much better scale and potentially also much Better cost in that the other thing that is I don't want to say the holy grail But it's kind of like the thing that everybody wants to have is the paper execution that you would say that you know What one query of a user base it costs you? I also feel like with JetGPT recently that has gotten a lot more attention because there you can see like this request cost you x sense So this paper execution is an interesting model that I think a lot of people want it has historically been very hard with Any other data store? Let's just say you have a Postgres instance You have no idea what one query really cost you in the past that somebody was running or with pretty much any other Data story was very hard to figure out the cost per requested You could either attribute that to a specific client or a department And so the cost was always like very much a ballpark But with serverless and like building on these primitives It might be much easier to actually figure out how much cost you contributing or one specific query or the ingestion volume How much cost that is adding to your system? One thing that is also important to note here is That object stores just like the compatibility of the API that the performance characteristics very widely and depend very much on the data store and again And he we always love abstraction, but abstraction is not magic You still need to know what happens under the hood and otherwise you will find out the hard way at some point and object stores are No different so for every cloud provider for example, there are different Like limits that you have for example in GCP if you have less than a thousand writes or five thousand reads per second You will be fine But for example, if you need more you will need to ramp that up over time so that the service adjusts for you This is very large provider specific I don't think S3 from Amazon for example has something like that and I haven't seen anything on Azure either But GCP is pretty Explicit about that that you basically start around these numbers and you can double them approximately every 20 minutes But if you know that you will have a lot of load coming in You will need to load a warm-up your system over the right amount of time to actually scale to Approximately that load so that you will be in a good state and not just get server errors back the other thing and that is more Evil or equal around cloud providers this in small variations basically applies to S3 or Amazon and Azure as well just as to GCP is that you want to avoid hot spotting Because well, like we said there are still service behind the scenes So somehow they need to distribute the data and handle all the requests And if you have a hotspot in your data, so for example, if you have a naming convention and then you have like You create a bucket and then you just increment the bucket name by one or you it just has the date in it Then you will always have a lot of locality For example, if you have a bucket where you write all the data into a single bucket today But you read over the last year the right bucket There will always be one bucket that will get 99% or more of all the right requests So you always have this hotspot there and that's a pattern that you should generally avoid because There is all the hardware down there and it will not magically solve the problem There is some distribution mechanism, but normally it it does not Well, if you have a naming convention that goes always to the same place you will be in trouble Other systems have Apply or use a workaround for that. Anybody knows how other systems work around that Where you have the sequential naming. So for example in Elastic, we we would have like the ID that is the distribution key We would hash that so even if you just increment the value by one through the hash function It would even be distributed over the entire key space and you can avoid problems From what I've read that is not how that most of the blob stores work but if you have like a Like a prefix and you always have the same prefix and then at time some consistent pattern in increments And you will have a hot spotting problem because the data will always end up in the same place So those are things you just need to figure out and understand if you have enough load and data Because otherwise you will suffer from the shortcomings of the implementation even though you don't know it or haven't seen it But you will feel it at some point that there is hot spotting under the hood The the other thing of course that Is still a thing is round trips are still expensive So if you put your blob storage in the US and you will try to retrieve the data in Europe The physics of round trips and the network latency doesn't change So locality of having blob store or instance store is still a thing So you cannot work around that and then depending on the cloud provider you will have often different access sorry different access classes where In free connects is cheaper, but might have more latency Or might want to be available after some time Those are normally obviously not very good for data stores where you want to have data available Very quickly and those are very with a very high durability So that's probably not what you want to optimize there for But a lot of different cloud providers or data stores in their cloud implementations Support some kind of serverless stateless implementation So for example for Postgres there is Neon, which is I think an open source implementation And that is exactly going for a blob store in the background You go by TV, Cockroach, Playcast cloud, Firebolt cloud There are many more because the general pattern of outsourcing the storage and state to a 3 is just to appealing to do that and We are no exception. I just to give you an idea of how the elastic search kind of like looks This is the the old way of doing stuff and this is like how we see blob stores and how we want to use those and This already looks slightly messy, but you can see it has almost all the problems of like a Classic distribution one. So just to to show that the the blue error here This is a right and the orange error. This is a read. So when the data comes in from your client You write the data and then you might have that The so-called hot layer This is where you do most of your writes and reads and you can see we have a primary and replicas You we need to write all the data twice at least for high availability because if that instance here dies And we still have the other copy, but you always write all the data twice And then at some point you might say like oh, I moved this to the code state where I I have my data read only So I'm not doing any writes and I do fewer reads I moved that to an instance with more more density or like larger disks and this CPU Maybe you have already backed up the data So you only need to see your copy anymore You can move it to frozen that might be backed by an option store But it still is like data moving through different stages You need replication and like writes are going all over the place This is kind of like that the classic approach of how you were accessing data stores and the different Tears with hot cold frozen those are highly optional But that was kind of like a performance optimization that here where you do all the writing and reading or most of the reading You want more CPU here You want to add more disks also if the data is in frequently accessed I always call it the compliance layer where yeah We need to store data for six months or 12 months or whatever But nobody's going to search that or at least not on a frequent basis So for that every now and then theory it can be quite slow. You can do some optimizations here The idea of serverless of status is then again You can see we have the the right part is all on this side And the read part is all on the other side and the only thing in the middle is basically the block store as three compatible storage Yeah, and then you can also split up the writing or indexing here and the the searching here because this one here writes and All the rights go through here if you have more rights happening You can scale up this later independently of the other one the same if you have more searches You can independently scale up the search here And go through here You don't even need replication because the assumption here is that you write the put the Index state but also the transaction log or bin log or whatever it's called in your data store That even that is going to the block store So you have really no state that is only an instance So you don't need to have any replication anymore, but your Replication layer is basically relying on the instance store So as I said before state is still a theme, but it's somebody else's problem, and you're just relying on them Doing a good enough job, but this basically allows you to scale different keys independently You don't need that replication that we've had before So you potentially only need to index half as much data anymore because you don't need to do it twice but only once and that should make it much easier for you to operate and Ideally also much cheaper because only writing the data once and scaling reading and writing independently Whereas up here you can see if I have more reads I need to scale The same components up or the same thing for ingestion whereas here I can have a more fine-grained approach and do that There are still a couple of challenges around that these are slightly less specific for example We have these masternodes that manage the state We also want to make those stateless because otherwise scaling down to zero for example It's not really a thing, but you always need to have this component that runs the data And we have 10 minutes left yes You have the transaction log or bing log or whatever it's called in your data store that you need to keep somewhere like our approach Again is that you can still put that on as three and it will perform well enough to actually outsource all of that state We have one thing where gets for example a real-time and how to implement that But those are like more very specific implementation you get so I skip over those We've recently done some benchmarks where we have in terms of ingestion It is much faster also because we only need to do it like once and not twice anymore And that that also needs a lot less CPU because you don't need to coordination between a primary and a replica But you can only just write it once go to the blob store and then it's the blob stores a problem And if you have optimized like how much data you put together and write in one operation Since you normally pay for API requests and data transfer and everything It can be a lot cheaper to run such a system if your access patterns are bad It might still be very expensive or even more expensive because for the extra overhead from S3 that you need to run So to wrap up Stateless and serverless are a thing. I know that everybody wants to call everything cloud native nowadays So maybe this is a cloud native data storage because some providers are very keen on Congress as we are cloud native and others are not and it's always a question of like what does that really mean? And I think it's kind of like as our industry Progresses this stateless and serverless approach of outsourcing your work to somebody else is very appealing and this will work well there One thing that sometimes comes up is like isn't all of this just auto-scaling And I hope we've made it clear that this is more than auto-scaling that you just Dynamically add more instances But you can never go down to zero with auto-scaling and that you still manage everything in terms of state yourself That serverless and status is more than just auto-scaling even though it might solve parts of the same problem It's really only It's probably less than half of the solution just because you need to think about state very differently in a serverless and stateless environment and with that That's it. Do we do you have any questions? I'll try to repeat the questions for the recording Yes, please I Yes, so the the question was basically about the the right access pattern for writing data and like the trade-off between having large enough Data chunks and then writing the data frequently enough. So I'm part of the Solution here is this Transaction lock that basically the basic assumption at least for us. I think that's quite similar for most others is that the right comes in You you bet a couple of transactions together and write that transaction lock and only then you acknowledge it back The data then is still kept on that index in here before being written to that index store But the guarantee is basically in a transaction log So if this note whatever note here indexes your data dies, you can always get out of the transaction log So for the transaction lock you might have to trade off that you write smaller chunks of data, but for before you write You create bigger chunks here That are then also easier to retrieve. So you don't have like a gazillion very small files But you have larger chunks that ideally you prepare depending on your ex's patterns So for example if you have I don't know let's say anything that is close to a time series like logs and anything with a timestamp You can then Just sort the data in that and you retrieve like a spend of time and you have like close data always put together So the retrieval is much simpler in a transaction log. You might have to have the trade-off that you do Smaller and more requests just for the durability but that is the trade-off that the writing of the actual data which then goes into the searching and Many scenarios that you will search way more often than you actually write the data that you can optimize for that so There there is a bit of a trade-off in the access patterns Our assumption I think and general assumption is that data is being read more than it's written So you want to batch that together in some format that optimizes for retrieving like not too many data chunks But fewer data chunks and larger ones Rather than having these tiny reads because yes, the ex's pattern from a local SSD is Totally different than with these blob stores. So that's kind of the way to go there I hope that answer the question again. Okay, great any other questions Yes, please For So the question was hot spotting with a Year-month date pattern for writing so your rights always go to the same so It's it's not historic data that you're processing but it's basically today's data So all your rights are basically going to today's bucket. Yes, I mean, I don't think it's it's great in terms of distribution because you have this hot spotting and Depending on your cloud provider, you will find that in the documentation. So I looked I only picked the GCP example Which was here, but there is similar documentation for S3 and Azure as well They all have or maybe you need to talk to support. We we've also spoken to support Maybe they don't make that super public because for many Every scenarios you don't feel it, but if you try to push the limit you might My hot spotting is is generally a thing. I think S3 also has a has caching layer that for reading can kind of like offset that problem if you have to basically exploit locality that you always read the Same bucket and that it will cache the data, but that's very S3 specific for reading And I think then it just depends on like the amount of data that you you name Our I think our fear is that if we have like a very large client and they write to to the same bucket because we Have incremental buckets or whatever and all their rights always go to the same Bucket that we might create a hot spot or that maybe we put three noisy Customers close together and that we might not and well, I think as long as it's working It's probably not a problem But yeah, what's what thing is is still a thing I mean, it's a classic anti-problem and I think even though it's abstracted behind five layers at this point it's not really going away and and On the negative side you see there's hot spotting on the positive side you say like oh, we exploit the locality So I think it's a bit of a trade-off again. What what makes sense and how it combines well anything else Otherwise, thanks a lot for joining if you want to have stickers take some stickers, so I don't have to carry them home And thanks a lot for joining