 issues, many of the other things contributed, but like, I was having like serious lower back issues, the switch to sanding and like 70, 80% of them were scorn. Interesting. Yeah, I think we're live. We'll see the participants. Yeah, about 10. There are a couple of meetups today, you know, competing meetups. So it's going to be like a mix. Okay, let's see. Okay, we'll just wait for a couple of minutes for people to join. So the session is also broadcast on YouTube. So there'll be some people from there. Just now we'll pass on the questions if there are any. So we'll take them as well. Just wait for two minutes for people to join. Yeah, let's get started. People would join, yes. Welcome to the Scaling Databases, the fifth episode of Scaling from First Principal Series. For the session, we have Nivedita, Shihari and Swannam with us to share their experience of working with databases at scale. Nivedita is a partner at Nieland. So she has worked at Staples, Gojek and other clients at scale designing databases, design database different software systems. Shihari is also a partner at Nieland. So she has worked at setting up database at scale, dealing with height to put load-ins in systems, and the failure strategies, etc. And Swannam is a senior engineering leader with a deep expertise in postgres. He conducts popular SQL master classes with Haskeek. I think we have really great expertise on working databases at scale here. I would like to go through the journey and understand how to approach working with databases at scale. Welcome everyone. So I would like to start the session by actually asking all of you to kind of tell us a little more about yourself when you work with the context of databases, etc. We can start with Nivedita. Okay. So yeah, hi, thanks for having me on this session. I have worked with various systems, various database systems over the last eight years. The kinds of databases I've worked with have mostly been Postgres and Postgres related OLAP databases, and somewhat MySQL, a little bit datomic, and most recently, yeah, actually, and a little bit of her timescale DBs, like working with monitoring systems and all. So those are the kinds of databases I've worked with. For example, at one of our clients, at Staples, I worked with a highly distributed system which had Postgres as a persistence layer and had a scale of about 300, 400 requests per second and a similar QPS and higher QPS and had a latency of 10 milliseconds, 99.9% latency of 10 milliseconds. And I was responsible for, I along with the team was responsible for not just writing the application, but also managing the database, managing the failover of the database, writing code for handling failover of the database, we had a SLA of five seconds for the failover. So and at that point of time, some of these, some of these libraries didn't exist and frameworks didn't exist. So it was all handwritten. Yeah, so that's kind of experience I bring. Sheari, do you want to go next? Thanks, Amit. I can go next. Hey, everyone. Am I audible? Yes. It's nice to be here on and thanks, thanks for having me at work. My experience is similar to it. I'm more, I will classify as an application developers who base expertise on things all around that necessary databases does happen to be a particular interest of mine. I've worked recently on very highly a concurrent system where correctness has been very important. Like 15k requests per minute type thing. With a very tiny database. But it has like everything has to be perfectly correct. And transactions were super important. And the previous experiences include similar to when it came from like reporting systems on on lab databases on time series databases. I've also worked on temporal databases a little bit like the atomic data script. And, you know, the other generic bunch, red is BigQuery, et cetera, and high availability and BBA, I guess kind of work and application maintenance and product support kind of work around the databases has really enriched a fair amount of my experience here. Thanks, Shri Haran. Sure. Yeah. Hey, thanks, Anand. And thanks, Neera and Sheari for the intro. So, yeah, I've come from a pretty heavy OTP transaction processing background. I have worked in OLAP and analytics processing as well. But I come from an additional SQL kind of background. I've been evangelizing Postgres for like seven, eight years now, worked with DBs for 15 years. And quite recently, we're building right now in fact in the process of building a system that's very elastic system that does not process like more than a couple of hundreds of requests per second during the day. But then there are two times of the day where it goes as high as 15,000 requests per second. And we're in the middle of designing that kind of system. And Postgres features quite strongly in that. And with specific aspect about scaling, I actually love teaching Postgres as well. Like Anand mentioned, I had hundreds of hundreds of students. I've taught SQL in my design and indexes and performance measurement, performance improvement, etc. So it's a topic that is pretty close to my heart. And with specific requests or specific topic for scaling, I think there's like, I love to bust the myths around how this databases do not scale. I mean, it's like, it's a constant topic of discussion amongst the cake circle, right? Like why don't we steal them as much as we can? So I'm happy to chat and maybe share a couple of opinions and maybe also dunk on a few other databases. Yeah, I'm looking forward to have more of the grants in the coming minutes. Thanks, Swannan. So I want to start with really a simple question, right? I mean, so a lot for many of us. How do you kind of pick a database for the project when starting a new project? How do you pick a database? Okay, so you look at Hacker News and there's a new database coming up every month, right? Or maybe every week, right? I mean, so there's something is upscale and something is scales really well, or lotions is really, I mean, there's something that you're paying for for the kind of things that talking about. So how do you kind of make a decision? What kind of database to pick for your project? So there's so many options. And most of times, it's very confusing. So how do you kind of approach it? So I want to kind of ask, I have one line answer, then I can I can give you the mic. Unless you can demonstrably prove that postgres is not very sufficient, you cannot use anything else. Like, just use postgres. And like, if you if you can prove to me that it's not going to be sufficient at the read write ratios that you want to achieve at the latency at the throughputs at the data size, like any of those angles, if we can prove that sure, it's not going to be sufficient. You know, then maybe we can think of other other choice. Otherwise, it's not even a choice, honestly. And I'm sure like native series are the same thing. I want to hear them as well, like sure. Yeah. Yeah, so yeah, more rule of thumb is go with postgres. There is a rule of thumb. But the kind of data they have to understand the kind of data that you're saving, like, is is does your data actually have relations between them? Like, are you actually if because relational data? Yes, go with postgres. But if your data itself is not relational, like, if your data is like, if you have just saved, like, like a string very simple streaming data coming in, which doesn't require the complexity of an rdbms, then you go with something something simpler, something like even a key value store. Like, why not? Why introduce a complexity of a complete rdbms? But and then there are other kinds of things to keep in mind, like, are you storing? Are you not sure the scheme of your data yet? Like, are you storing unstructured data? Then you just go with storing unstructured data. Then you don't go with postgres. Because you there are the you won't be like utilizing all the capabilities of postgres at that point, like if you're if you're going with unstructured data. So yeah, but but I agree with swan and that like, 80 to 90% of the time, what you want to go with is just postgres. Yeah, thanks. I mean, I have some points on kind of come back. But I want here to see her before I want to jump from that. Sure. So given that there's enough voice for postgres, right, let me give voice to something else. Although I do agree generally with this and with the idea, I think this is a safe choice, right, as a general purpose database, then you're not sure exactly what you need. The one point one place where I find postgres somewhat lacking, like, is in one word, I would say availability, right, that is, like, it's like, even a highly available system is not generally available, right, the way other new SQL databases or distributed databases, which are distributed by design are available. It isn't an availability puts a massive dent in people's uptimes, especially if they have SLOs to their clients or whatever, right, like if your service goes down and then people doing it food or whatever, that's a bad experience. And you don't want to be there. So at that point, you'll be like, Okay, how do I do it? Now, if you want to do it for postgres, you need people who understand what that is. You need to train people in your team to do that, and then be available to fix issues that happen with it, all of that. So if you have a small team, there are a small team or you want to go an easier route, not necessarily the simplest route or like, something that makes sense long term, but if you want something that works, you might want to pick something else. Right. So it depends on what the team knows. It's like, you'll have to pick something practical, like sometimes you might want to go with postgres, but keep it managed, right? Sometimes you might want to, Hey, you know what I just needed? That is, you know, I just want to key value store, I can put everything in my database there. I don't have transactions, or like, in the weird, like, 0.1 case that I need a lock, I can do a Redis lock. That's okay. I can do some distributed transactions around Redis. I can manage it. It's fundamentally what my team knows, it's fine. Scaling rights with postgres has been slightly difficult at times, right? Like it's single known. So if you don't know exactly how to do it, keep your connections really tiny, keep your transactions super fast, maybe you stored procs and stuff like that. That's hard. So multi master setups, for example, might be like, again, like a distributed database might be a good place. Or even a time series database might be a good case, if you really want like a lot of data going into your DB really, really quickly. Another place where you might not want to use postgres is with immutable data, right? Like it's not really built for that, like a postgres, like updates are a fundamental part of any relational database. And if you really don't have updates in your system, all you have are in search and delete, then you might want to consider something else. But general sentiment holds, I think you can you can do JS on everything in in postgres, you can do, you can have like five billion rows in a single table on postgres, you can manage it, it's okay. For me, like the the reason I choose postgres is sure there are in our everyday and everything other pieces are challenging, and you could have a right, no rights, etc. But the data integrity that you get with a reasonably well designed schema is kind of unmatched. And and often we have high availability setups, I relied on system architecture to provide that, add like a queue in the front, you know, manage it, etc. It's a source of truth. And I look at the DB, if you're treating it as a source of truth, the data integrity and consistency becomes your critical requirement from from the DB. And and then so in the past, whenever I needed a very high availability setup at a higher throughput, higher latency or lower latency setups, I relied on queue managers to ease out the load, write load specifically, because you know, like really said, writes on the door actually hard at a higher scale. So I've used like you managers in the front to ease out that load, and taken on that HAA load on the application side, just so that I can continue to rely on the data integrity of postgres, and postgres in particular, because of its MVCC model, which kind of shines when you're doing concurrency, and I can current writes. Thanks. So I want to kind of pick on what the way the set about. So she said she'll pick some other key value store or something for the streaming data and said, if it's unstructured, we may not pick something else as unposted. I want to kind of think about what could be the options, right? I mean, what would be those kinds of options? I have not worked with unstructured data, to be really honest. But like your options would be you can go with Cassandra, you can go with a MongoDB, I mean, sorry, had to say that out loud. You can go with MongoDB with that your own panel. But but there are options other than in fact, like what what Shriary said that right, like put it in Redis, like just you have a key and your value as a JSON. That's it. You can put it in Redis. But and even actually with with the JSON support and everything, even postgres, like most people write mostly structured data and some unstructured data, put in there as part of their schema, even with postgres. So even that will do. But it won't shine if all your data is unstructured. Yeah, so I mean, I can't have asked you because like you said, like, for ease of managing, right? I mean, I think I feel like running a postgres and dumping JSON there is probably easier to manage than maybe managing, let's say Cassandra or something. I mean, unless you have that expertise in the team to kind of manage that kind of stuff. Okay, so let me kind of bring me to the next question, like what Shriary kind of also mentioned, okay, now you try to pick what the team kind of has expertise on. That's that's probably the best option. Okay, for the most cases, postgres is not too hard to set up and it's badly tested. So that's probably one that you try to pick. But so one thing that I kind of want to ask here is, so when we kind of starting out, right? So there's, there may not be enough capacity in the organization to work with the internals of databases, etc. So the tendency is to kind of find managed solutions. So my question is, does that really work? Okay, I mean, how much managed is really managed solution? Okay, do we really need to build expertise in the house? When is it my question is, is that isn't that inevitable? Like what really working at any time is at scale? You'll end up creating that expertise, either the hardware or by plan. So I mean, I want to kind of hear about what you think about it. Okay, I mean, is there something called a managed solution really? There's RDS, right? See, with scale, you really have to define what scale means in this context. Like if you're talking about RDS, like a DB6 R16 instance, that's like 120 gigabytes of memory and multiple terabytes of disk. It works reliable. There's an automatic failure point point in time recovery, etc. These things are pretty hard to build. You take any open source system and some of these things are pretty hard to build, like you said. And you if you know that we're going to run into these bounds, upper bounds, you know, off memory. At that point, you can think of going to something else. But my, my overall assertion is that the these bounds are very high. I personally worked with lower single digit, you know, instances for postgres single instance, at a couple of thousand requests per second, at reasonably favorable, favorable data ratios. And, and we've, we've not run into significant issues without any serious DBA chops, you know, in our team. Like, and so, so I really, when it comes to managed services, like my default choice is RDS, because the maturity that they bring, like the, the instance that I just spoke about is couple of thousand dollars a month. And you're really hitting it. Like, it's a, it's a massive machine. Like, it's very hard to challenge the bounds of this machine on an ordinary scale. And so that's, so that's what, that's what I think, like when I think about managed service, like, you know, the RDS offerings are quite solid. And then you can upgrade to Aurora, which is based on Spanner, which we probably might talk about, which is another level of scalability and availability. You can quickly upgrade. So, and so you, in my opinion, like, you never need to really build out database availability engineering in, in, in house until either your team size becomes too big and, you know, you're running into different kinds of limits, or like, or you really do hit that scale, in which case, you know, you have other problems as well to solve. So, so that's what's going on. The golden rule there is that same with code, right? With code rewrites and architecture rewrites, you live with bad code, but you always rewrite bad architecture. So, a similar thing applies to DPs as well. That's interesting perspective. Okay. But at any instances, before you go, let me, I want to see like, do you want to add anything to that? What Swarang said? Do you want to go first? Yes. I mean, Ditto RDS, like, there are certain things that RDS provides that, unless, as Swarang said, like, unless you have a very specific team that's managing all of that, like, you want to have backup, you want to have point fiber recovery, you want to have failovers, automatic failovers built in, because in most of the cases, databases, your single point of failure, like, if, if, if, if database is not available, yet your app doesn't work in most of the cases, like, unless you're, you're keeping database as a, as a, as a side source of truth, most of the transactional cases, if database is not available, your application doesn't work. So, yeah, all of these things are, okay, with Postgres, it's not extremely hard to do it. There are very good community and libraries around it that allow you to do it, like, like, PG backups are built in, replication is built in. You can set up automatic failovers with Stolon or with, what's the other thing, Bucardo, I think. So, so these are, these are other things, you, it's not like you can't build them up, you can. And with, I would say, with minimal expertise, like, I'm not an expert in, in, in Postgres. I'm not an expert in, like, I'm not, I'm not, I don't work for second quadrant, but I was able to build that, like, a failover setup with a backup setup. And so it's not extremely hard to build that, if you really want to. But if you have money to throw at this, at the problem, you just go with RDS. Yeah, I think that's a pretty balanced view. I left some nuances that I have seen from experience. One thing is that your DB will go down. Like, one day it'll go down. Like, for sure, it's happening, right? So plan for that. Do you have a plan? And who will, who will take care of it and how will they take care of it? Try actually taking your DB down and ensure you can bring it up if it is self-managed. And with managed, it's actually often unclear how much is managed, right? Like, with RDS or Cloud SQL, you see backups are managed. And, okay, there's a little bit of more than one level. RDS actually backs up, like AWS does volume backup. It doesn't do logical backup. It's not doing a PG dump every night. It's doing a volume disk file system level backup, whatever it is, proprietary. They don't tell you what's going on there, but they give you a certain amount of guarantee. There's a whole managed thing also brings the notion of trust. Like, how much they really trust AWS? What if they hold on to their SLA? Are you okay with just being paid back if they don't hold on to it? And how much do you trust your data? Are you okay with just putting your data at, you know, encryption at rest? And then is that fine? Right? And like that level of things, but like leaving those things aside, doing backup yourself and testing them often is very difficult. So that's something that you do get kind of for, you know, money trade-off with the managed service. High availability, again, it's super important with AWS. You get those multi-AZ with others. I don't know what, but I'm pretty sure it's up there. One other thing that you would get for free is good defaults. Like, AWS picks up the amount of shared memory or the amount of, you know, memory used to use for computations and such based on the system's requirements. This is something that you'd have to code down and write recipes for in your own boxes. You'll get that for free. Their default auto vacuum is much better also. Like, so those kinds of things you don't have to think about as much. Like Postgres' default auto vacuum settings are for like some 5-10 years ago and those aren't changed for whatever reason. So if you don't want to depend on those, if you don't want to go understand them, it's pretty good. Building encryption at rest for your data, if you really want that, that's also something that they provide. Where you wouldn't use it is when, say, you want to squeeze out performance. So for example, one time what we did was we were using a database as a reporting database and a different one as a transaction database. We went and changed the block size and like kernel, cache settings so that we can squeeze out performance. You don't have access to the buck. You can't do any of that, right? So that, if that is your tipping point then. So are you saying that this is a replica of the mask node and you won't kind of do different settings for this? Yeah, exactly. That's exactly what we did actually. We had a reporting DB which was a replica but we wanted like massive reams, right? And your transaction database is not defined for that. It's not designed for that. So instead of using a block size which is like 8 or something, we used like 10 times that. So 128 or something like that and suddenly we got massive uptake in our speed. Yeah, I think that you can read from this very quickly. Yeah, that's a great point. Like it's like, you know, good libraries or frameworks like deals, for example, right? They have to cater to the general audience and the 1890 or 99% of the use case P99 and same with our managed services, they cater to the P99 and if you're beyond that P99 then you're on your own even with a managed service and some of that really does require some forward thinking when setting things up. And this, the one example you give here, I think is a good sort of nudge in that direction. You really need to know your query patterns and your data shape of your data and your usage patterns and then somewhere you're going to hit that corner and then what? You know, then then what? So actually there's some interesting questions from the audience. I won't kind of take it now. I think it's so when it's kind of relevant is one of what you just mentioned. So what are the kind of DB choices to make when designing cases like one right and then reads? Something like a stock ticker. Can Influx DB fare well here? Actually, I've never used Influx DB. Like I said, I would still use closed areas because there's not a lot with stock. See for high read and like high read-write ratios, like where the number of reads are right, the listener database is practically built for that use case. When your rights start getting heavier, that's when you go to other systems. So if you have like low read, low write, sorry, low writes, high reads, then most are going to serve you well. But you know for stock tickers and all like it depends like how fast you want to date. Like there are multiple design choices like Chronicle, for example, has a very interesting Java Chronicle framework has a very interesting write-up on how they handle that. But yeah, like. Depends on the kinds of reads you want to do. Like if you're doing aggregations, if you're doing groupings, then Influx actually might work really well. So the kinds of uses I have done for Influx was monitoring systems. Like we basically have monitoring stats going into, like basically write up and only kind of setups. You just have sending monitoring stats to Influx DB and then on the read side of things you're doing groups over that. So yeah, if that's the kind of scenario you want to work with, Influx might work well. So to add to this, like one write and reads actually is a different problem from stock tickers. Like it's a larger set of problems. One write and read. So Postgres, for example, is very good at scaling writes. This is a very good scaling reads. Like you can, you have read repeatability. You can put multiple stand-byes and just keep reading off of them and that scales really well. Writes are hard to scale because it's single write and it's a single node. But for stock, because on the other hand it's different, right? It's not and reads necessarily as much as it is like, okay, there's a flurry of information coming in. I just wanted to go into the database as soon as possible. So TimeScale DB built on top of Postgres is actually a very good choice. I would say for that. And it's specifically designed to scale writes and to get rid of data also. So like Nate was saying, like monitoring systems, what they do is they optimize for retention. So you hold X amount of days of data. So Postgres is vacuuming doesn't, I mean you need to work hard at it to get it to work that way and needs a certain level of expertise. Time series databases on the other side have designed for this specifically. So they would have the data partition by time, because you know the time is a constant in your thing, the partition of by time and then delete of every night a certain amount of data. All the rhyme series data databases have things like rollups and aggregates, those kinds of things, kind of first class, which are there, are available to you in Postgres, but might not be as highly performant as ones that are specifically designed for this. Yeah, like batching is a great example. If you always need to read data in large batches, then the relational database may not be the best choice for you. You better off doing something like a sander or something, where you can have wider rows and just keep reading those. Like they're very optimized for those kind of things. Stockticker could be like a good example for that. You always need to read all the values. You know, it may not be a time series, but just a large collection. And in that case, Postgres you have to like, you have to set it up in a very specific way, which needs a lot of schema design maturity, which again may or may not be present. Yeah, so I mean, I've not worked with Inflex TV, but I think I can add something for the Stockticker. So we had Kailash Nath from Zero on the second episode of scaling from first principles. And so they think they have a very similar setup. They have to kind of broadcast this Stockticker to all the users. I think they use a setup with Postgres plus Redis, where they have a massive Redis cluster sitting in front of Postgres. And that keeps broadcasting to all the users through Sockets or something. I think you can, they have a blog about how they do all the stuff. I think that's a good place to check out. This is a testament to Hell Yes Postgres. Very clear answer to like, oh, Stockticker, I feel they're done that. So there's interesting question from Grace. Scaling databases comes with scaling migrations, running migration, adding columns, building columns, can bring down the database. What are some good strategies for doing migrations for databases with and without shading and end goal is to avoid downtime and prevent APS error from throwing exceptions. Who wants to take it? Go ahead Swana. I have a bad story about this. So I'll go in the end. Cool, cool, cool. All right. So again, I think you're becoming far more, you know, like basically becoming Postgres discussion here. For migrations, the absolute minimum is DDL transactions. Like, I honestly, after years of working in DB, I do not know how to do zero downtime migrations without DDL transactions. Without said otherwise, like, you need a lot of different setup. But yeah, like, there are a lot of standard patterns out there for specifically for risky migrations, like, you know, adding default values to large columns, et cetera. Shopify has a really, really mature library for detecting some of those. But there is no silver bullet. The real answer is that your development process has to be mature enough to handle it. I have not seen any automatic way of managing and just doing what you want without taking some downtime ahead. The real answer is that you have to manage it in your dev process. And I also have like a couple of horror stories, like, you know, data is going down because we added a default value, you know, big int issues, et cetera, all the little migrations, you know, I've been burned a couple of times. But I want to hear your story. Yeah, I think it's about, I have learned this also the hard way. It's about setting your developer practices properly. Like how to deprecate a column. It's important. Like never drop a, never drop a column, like as your first thing, like deprecate that column, make sure there's no code that's using this. And then maybe in the second, as part of second deployment, drop the column. How to, so yeah, and int IDs are the most common issues all these, all growing companies face where when the schema was first designed, no one thought that the ID would actually cross integer. I think that's how they design it. Like, oh yeah, it will never cross integer. And then three years later, you still have an int ID in your system, which is sequentially increasing. And it's, if you've not noticed it on time, you have two weeks before it will overflow. Now what do you do in your databases of size of one TV? Like you, and you have like a 50 gig table. How will you change this primary key value from integer to begin integer? So, so just you have to, I think the, the, the one thing that you have to keep in mind is like, think things through when like, when you're designing the schema, you have to think about like, how will this, how will this, how will my schema look like if, if things will infinitely grow? Like if, if my table has to just infinitely grow, use UU IDs, use string IDs, use like and like begin that's like a very obvious answer, but like it begin to also has a simulation. So then just use UU IDs as IDs. So yeah, the, my horror story was a one TV, it was not my team. It was an adjacent team, but it was a one TV database with a, with two weeks to two weeks left and ID reaching its overflow limit. And the entire app will come crashing down in two weeks. Have to migrate this. Anna, I actually first thought you were talking about schema migrations. Anand, that's a different story. If it's over data migrations, yeah, this gets a lot more interesting, right? So the regular, like the default value and the begin and overflow, the stories are all covered. So I will no, actually the question was schema migrations. Yeah. The question was schema migrations. The one thing I would say about schema migrations is I mean, yeah, try it out. Like how hard is it? Get your broad DB locally and try it out or like put it in a mirroring environment in first, see it working, ensure that your application doesn't go down, right? Like put a health check on it, run, keep running the health check and then perform your migration, see if it works and then tweak it and work on your process until it works. Yeah, I think the key yeah, sorry. I think the key you mentioned is like you have a copy of production database but you're mirroring same traffic on it, right? I mean, I think that's a key. Or either way, actually you can work with it. There's a whole gamut of things that you can do in between. If you don't have a broad database and it's impossible to get a broad database for you, like for example, you have sensitive data there, you're not allowed to have sensitive data, then create something that's somewhat similar. It's harder to do, obviously you'd have to create that data, you need something that generates that data and then pipes it into your database and all of that. But if that's possible, do that. If it's possible to get broad data, nothing like it. If it's possible to mirror your broad traffic, that's the best thing, right? It's like actually trying it out, like put on a broad system, best. If you can't, if you don't have that, try to simulate that. Try to simulate some of the requests. Like one of the things that we do is like any transaction service generally has like a message bus with events coming out of it, which usually mirror the database. So consume that and then pipe that into a request stream into your different service. That can, it's sort of like a mirroring of broad traffic without, with you building a small thing of magic to do that yourself, rather than rely on actual broad traffic. And if nothing else, you can just like set up Vegeta or something like that to throw at your system to instrument a little bit. Different things with different amount of time to build and have each one of these things. If you are performing data migrations, a lot of times production, it's probably good to set these up. But if you're also performing schema or data migration time things in production a lot, it's probably a good idea to reflect on your schema design. Other things are like, I would add indexes concurrently and like very big migrations I would use ETL systems to like actually create an entirely new database with an entirely new schema design, move the entire data over slowly when the other system is over, just switch to the new database. An easier way to do this is use the logical replication or VG logicals or one of these things that give you a SQL stream that you can put it into a different schema potentially. Why is it VL? Yeah. But a few moments are like getting very popular, you know, especially with zero runtime migrations. You want to bring up that replica switch over all those. I have two principles to share Anand if you Sure. Yeah. On the same topic of schema migration one, again slightly abstract tip, but they're serving quite well. One is like bounded Q principle, right? The design pattern like be extremely aware of every single bound in your schema migration, right? It could be value limits. It could be connection limits, pool size, redirect ratios, et cetera, all kinds of bounds. Like the probability of a bounded Q or a value going out of the bound that you thought it was is very high during specifically during schema migrations. And so again, goes back to developer practices. And the second thing is forward and backward comparatively, especially in the container world, it's quite possible that your code will be ahead of DB or your DB will be ahead of the code and you have to count account for both the possibilities. So all the code that you write the application layer must be forward compatible and backward compatible, which means like renaming columns and those kind of things are like completely out of question because you never want to get into that kind of soup. And so these principles have served me like well over and over again, whenever I'm doing like live migrations and on high availability systems. Doing a select star, for example. Right, or like delete from without qualifiers. I'm just kidding. So there are a couple of other questions. I think they're very specific to Postgres. So is PG good for text searches? Actually, is there anything non Postgres we should I was thinking like, you know, maybe we can hold on Postgres for a little bit because all of us has a clear fine voice of fine girls. No, don't worry. My SQL is also coming up. Sorry, go on. What was the question? Yeah, I mean, is PG good for text searches? Yeah, yeah. I mean, it is. And I mean, when you say good, right? Again, good is a very dicey. See, Postgres has sound decks, diagram, DS vectors support built in. And if your search quality is good enough on those things, those things that there are four or five different tools built into Postgres with indexing and all support. And if the quality of text search that you're looking for matches those tools, like specifically these, the DS vector, diagrams and sound decks, like a lot of others. If your search quality is good enough with these three, go for Postgres, because the operational nightmare you'll reduce with something like elastic search or something is like a couple of orders of magnitude. Postgres. And I think by now, my let it be in my SQL to have amazing support for languages as well. If you're working with multiple different languages, they do these things really, really well. But because they are fundamentally built into the systems. So like my fundamental like rule is like these are the three things that I try. And if that fits in, then I go with Postgres or all the database that I'm working with. And the real with only the opportunity. But otherwise, then you're back to like where you can like, like custom wrappers and custom indexers and all those things. So that's my short stick on text search. Sreeha, do you have anything to add? It's very good. I would say it's really good for text searches. Yeah, there's a lot of indexes that are very specific text searches as well. There's an entire manual on how to do full text searches in Postgres and all of them are performing and like they are they are being worked on constantly. That said, like I think mostly have a Swannan's opinion on this, which is at a particular scale if this thing doesn't work for you. Elastic search, solar or something like that where the entire database is custom built for search is going to perform better. And it's a difference, right? Like that where I think it's like flash optimized or whatever, right? Like Postgres is it works very well on disk, but if you are confident you'll have everything in memory all the time. There are a bunch of different kinds of optimizations that you know, the likes of Lucene, solar, elastic search and these things will provide you that you won't get here. In the there's one more thing I want to add on on this specific topic. The decision to go with Postgres or elastic search truly comes on to two things. One is search quality, which I said like, you know, figure out the tools that Postgres offers. But the second is like again, redirect ratios and how much freshness do you want in your search? There's going to be always some lag. You're done into sync issues, etc. And then do you have the application operations maturity to handle a different system? And those are like the general standard decision framework or decision points that I use. And like I said, most P99 cases will be served with these two things. And if you are in that P99 plus then I'm sure like you have a lot of other options. Do you have anything to add? Cool. So there's one question on MySQL. So the question is like from a SQL program perspective, is there any difference between MySQL and YADB? Yes, no. I mean, I don't know a lot about it but I would pick Maria over MySQL any day. I think there are fundamental things about design that have been changed in Maria. I mean, that's my understanding at least, right? Like with MySQL, one of the fundamental design flaws was that the integrity engine or what do you call that? The persistence engine in ODB is not built in and it's pluggable, etc. And with Maria, it was fixed firsthand. And a lot of the tenets of Maria are very similar to Postgres by design. Of course, like my knowledge on this is very scarce. There's parts, one of those. I'm actually in the MariaDB camp when it comes to MySQL versus MariaDB. It has matured and evolved over time quite a bit. They even added digital transactions which was my biggest pain. So yeah, I mean, there's no difference. I mean, there's no choice there. MariaDB seems to appear to be a winner. It's also support and managed. So if that is your body, then again, you have commercial support available. If you truly want to, you know, go with MariaDB. So when you get ideas, do you get MySQL or MariaDB? It's MySQL. And again, like because I'm so much such a Postgres person, I may have to go look at the documentation. Okay. Cool. So the other one is actually pretty interesting as Abhinav is asking, can you talk about how to design tables, schemas for high scale? I think that's coming to one of things that important topic, like how do you kind of approach designing schemas and how do you, so what are your first principle approach and how do you think that you should do or what you should avoid when you're designing schemas, right? I mean, what's that? Nietzsche, do you want to take this? Yeah, I can give it a start, potentially. So you can take I mean, on the spectrum of like heavily normalized to heavily denormalized according to your applications, you will have to like start somewhere, like see where you fit in. If you have low traffic and you want to optimize for domain modeling, then maybe pick a normalized schema to begin with. And then as and when you need performance slowly move towards denormalized ones, if you know that it's going to be a performance system from the beginning, maybe start moving towards something denormalized right from the beginning. It really depends on the domain and application. I don't think there's one size fits all like a straight approach for schema design, right? It has to be based on the problem statement. And then within that, like for the database, for the table, for the column, they would be designed. Whether I would pick like multiple columns for one thing or like one JSON field with all the three columns also depends on the main. I'm finding it hard to answer. Schema design is a data modeling and data integration. So sorry, need to go on. Yeah, I was sort of going with that as well. Like schema design mostly completely depends on your data model. And so there are things that you can do, right? Like say, you don't always have to go for joints. Sometimes doing n plus one and queries over joints is better. Like you do, even if you have normalized data set, when you you can avoid joints if you want to avoid joints. But to be really honest, I think I have I've done two table, three table joints at scale on Postgres. And you can, it's pretty optimized for that. So it's not really, like you won't get that much of a performance hit even if you're doing joints at scale on Postgres at least. Actually there is, right? Like there is so I found the joints to be costly when done at very large data sets with very large data sets with small ones, of course, it's fine, right? If it's the equivalent of doing like two index lookups and then like joining, that's pretty simple. And if that's what you're doing in your transactional high throughput system, that's completely fine. But if you're in the same system, if you're trying to like do like massive joints to get the data out, that might not be right. Yeah, depends on the kind of joints you're doing, like that's why I said, like if it's just a say two table with an index, with index fields, then it's not that much of a difference than doing two queries over it. Like you don't get that much of a difference there. Even like two or 10 tables doesn't matter so much in my opinion, as much as how much data you want to read. If you are joining across multiple tables and more to read like thousands, hundreds of thousands of records, that is when the joints are going to probably kill you because then the disc reads and all are not optimized. If you're looking for isolated records, smaller number of records, no amount of join like is going to stop you because we'll perform optimally as long as we have the adequate index set up. But if you're reading large paginated data like I said batch reads, then there are clear design patterns where you might not want to use joints, you might want to rule like table scans and return the data fast in memory. And again, but going back to specific about high scaling, high scale and schema design, we really, really need to stop thinking about schema design as a scaling tool. Indexes are not a design tool, indexes are a performance tool. You can add whatever index you want, you can create views, there are multiple tools available, you can create more realized views, regular views, foreign data wrappers, there are so many native built-in tools available to bring performance to your correct schema. Schema design must always be about correctness and integrity of data. You have to get that right first, go for first principles and you can build a lot on top. Like I said, you know, you can build application layer things, butter, Kafka, elastic search, other things, latest, etc. On top of your Postgres or your source of truth which can get you all kinds of performance or whatever it is you want. Like in worst case, you can have a beefy application machine which just leads to all the data and keeps it in memory, etc. So you can do these kind of patterns if you are truly worried about performance. But schema design is not something you want to truly optimize your performance about. What say this? There is something to say about, I mean, there is something to say about transactional schema versus say reporting schema. Like you want to move from like the schema needs that your transactional DB needs will not probably be the same that you're reporting DB needs. So you want to have some sort of ETL in place, like do transformation, do move from regular schema to like a star schema or a snowflake schema and transform your data into that kind of schema and save it around the DBN. And then your reporting can be done on top of that. So in that way, I guess schema can be related to sort of scale in some way but yes, in most cases, you want to think of schema as from a data modeling perspective. To add to that, I would actually prefer a slightly different schema for reporting if I have a lot of reporting needs. And for that, I would prefer something like a view, for example, if the problem is writing a lot of sequences, a view would solve it. If the problem is creating a reporting schema but not creating a reporting table within ETL, a materialized view goes a long way in solving that midway. So it's not just table schema, I would say. So if you consider views and materialized views also a scheme they are very normalized thing that Is it just me or is domain of like, can you hear me all right? It was breaking a bit in the middle. You went off for, yeah, you went off for like a 20 seconds time period. You're off. I think it may be fine. I think the materialized views and views is it fine now? No, I think it broke again. Would you try it running off your video see if that helps for a while? Yeah, sure. Is this helping? Yeah, yeah, quite good. I think it's clear now. I think we lost again. Maybe we can come back to that later. Yeah. Yeah. So Swannan, I have a question for you on the topic that you mentioned just before. Okay. You're talking about, when you're talking, when you're answering the question about the joints that you said and the performance of that, it depends on how much data you're reading. I mean, I want to kind of, I'm not sure if I can, my understanding is correct on that. So what I think is like, if you have a lot of tables you're joining, the time it takes actually is to kind of compute what data to do, even if you're actually reading first 10 rows, computing the first 10 rows would actually be, take a really long time, right? I mean, that depends like, are you doing index only lookups? You know, are you doing like, indexed lookups versus index only lookups? I said, if all your joints are indexed and all the data that you want to read is on the index, like for example, if you have a composite index where first one is the say customer ID, order and the second one is like timestamp. You're looking at some orders or something like a very stereotypical example. And then you're looking to start a sort and there's like a complex join. And those are the kind of reads that are fairly fast, no matter what joints you have. But like I said, the problem starts when your data is fragmented in various places on the disk. And now you want to read all of those because post-game factoring works per table. And you know, it may or may not always work out in your favor if you're doing joints across the different tables. Okay, my point is like, sorry, go on. No, that's it. The other time when sometimes joints have been known to perform poorly is when the tables are disproportionately different, a very large table with a very small table, etc. Again, because you know, post-game and money, we do a lot of heavy optimizations on, based on the read patterns, whatever is read frequently will be fresh, etc. And so if you have disproportionate data, then that can also affect your joint performance specifically. So my point is like if you add a, if you're just doing a plain joint that may work, but if you add a group by or something on top of that, that may actually compute heavy ability, right? True, true. Yeah, I mean group by is aggregations, obviously, you know, fundamentally they can, can always be slow because aggregations work on a different level of data. You have to, you have to assimilate the data first and then aggregate it. So if you don't have specific indexes or specific, like what do you say, if you don't get lucky, then you're always going to, aggregation is always going to be slow as compared to your standard lookups. And again, like the thing that I go keep going back to is that if you're doing all of these things, right, that probably speaks to your primary secondary and analytics and reporting schemas, right? You may want to take a different approach to your source of truth versus your analytics to her. And those are again some, some schema maturity, some application maturity sort of is needed there. Good. So there's a question from Abhishek. In the age of Big Data ML AI, are there any kind of database design patterns that should be focused on? I didn't get the question. Can you please repeat? In the age of Big Data ML AI, are there any kind of database design patterns that should be focused on? I don't know what to answer without sounding very cynical. No. So assuming that you have enough amount of data to do AI and other things over, you're looking at an analytical database, right? So possibly something called something like BigQuery or Redshift or anyone of those things is generally what I'd use. Yeah, especially just because, you know, you can write extremely large queries over, well, large queries over large, larger sets of data where you're even potentially joining over multiple databases and things like that. Yeah. I would use plain old Postgres for analytics where I'm clearly going into like hundreds of TVs of data. The thing is AI and ML are compute elements and databases are storage elements. There is a lot of impedance mismatch in that thinking that I have AI and ML, like they are sort of compatible. They're also one of the reasons why I had to use its own file system. The traditional file system does not serve its purpose. And so I don't think AI and ML and Postgres or data stores can be compared that way or should be thought of in that way specifically because one is quite heavily a complete problem. Like you can just get a 120 gigabyte RAM and put everything in memory and run pandas or CUDA or something. And then you don't really need any database to do any of your work. Databases are about storing your data and retrieving it truthfully. And then all other layers on top of that. So that's how I think about these two topics specifically. So I haven't really worked at very large data and ML in a space but I've worked with a lot of ML stuff. So one thing that I found useful is the complexity is actually in actually keeping the data in the right shape and kind of accessing it when needed. So what I found actually is the pattern that worked for me was make S3 your primary source of truth. So all the data is actually dumped in S3 and from there it goes to the Postgres database for all your queries, et cetera. And since there's a lot of ML load, you want to kind of repetitively train your data models, et cetera. You have bulk reads and you want to bulk reads. What works is, I mean, S3 works a lot better than actually working with database. So I patch in the data by day or a week and then dump it in S3. And whenever the data is modified, I rewrite the entire thing into S3 like that. And that can work really well for me because that's very easy model to kind of think about and explain to a data scientist that I'm working with. If the data is changed, you just, that gets updated in the S3 and then that gets pushed to the database. So that becomes like, when the world becomes very simple to kind of think and understand and that's what kind of works for me. Yeah, I was under the impression that the question was about assuming you had an ML system that had to read from a database. What kind of database would that likely be and what kind of data would that read from? Yeah, it makes sense. Like if you're going to just use, like if you're fine doing in-memory computations then you don't really need a database. No, I think there are two parts to ML system, I believe. One is when you're training, you want to do bulk reads and kind of do that probably in memory. But when you're kind of doing inference, you want to hit some database and get some data. So you probably need both of them. Yeah, sure. Yeah. So I think there are, I think many more questions kind of pouring in. I think we're kind of similar to you. We'll drop up in 10 minutes. So let's kind of quickly see what else we have. I think there's quite interesting questions. So one use case to use database, so this is a question from Vishal Melmati. One of the used database other than IWMS could be like the MapReduce where we need to do computation near the data instead of fetching the huge amount of data from the database application. What are your thoughts? Yes. Yes. There are, when you need to do MapReduce, like go with the file system like a Hadoop thing or like a more of a streaming thing that you can do MapReduce over. In that case, an IWMS won't do. Actually, on that specific topic, the reliable Q-element systems are actually quite nice data source, Kinesis, Kafka. These are also quite reliable. They hold your data reliably. You can read large patches. And if you have a very heavy compute element, then like I said, S3 is probably the best. Honestly, flat piles or parquet or something, like you can pick your format, but you want to be very, very simple. You don't want to take the asset overload. And in my opinion, even Q-managers work really well. Yeah, with Kafka Stream, especially, you can do a lot of computations on top of the data coming in from Kafka. So I think there are many questions kind of pouring in. Let me see which ones to pick, given that we have a short time. There's a question from Pratik. In what scenario? I know SQL DB like Mongo, Dynamo DB, an obvious choice. So we're just using JSON columns and IWMS. Should I take that? Like Siri, do you have anything? Yeah, give it a shot. So basically, if you don't have transactions, if you're just holding data and you're just looking them up, you're not writing into it much, you don't have any transaction. There can be that kind of data. Like it's just like a catalog lookup, for example, where your data has different kinds of structures. It's not easy to put it into a traditional table. Yeah, so it's just a lookup. I think I wouldn't mind using a key value store. Dynamo would work very well just as any other potentially document-based key value store, or you just read it. That's when I wouldn't use Postgres. I don't need fun and we're saying earlier, so cost of asset. Yeah, so in fact, I have the perfect use case for that in the current TV that we're working with. Data is only written. It's practically never read from the primary store. We have Viral Mongo Atlas. It works fabulously. The writes are insanely fast. Anyhow, like thousands, 10,000 writes per second it does. And then we need the analytical data, where obviously we ran into, Mongo is not great for analytic queries, so we are building like a streaming where we are. The secondary TV is going to be Postgres where you can add up queries. But it just goes back to the whole thing. If you have two different distinct set of requirements, just use two distinct stores, and then the source of truth can be close to your domain model. So I would slightly disagree. I don't think I would use Mongo for writes because consistency. I still need consistency, right? If it doesn't store on disk, like despite all the journaling and all that, if it doesn't write to disk every one image, if I'm writing really fast and at one point if the power goes off and comes back in my box suddenly swaps and Mongo loses my information, if I'm not okay with it, I won't use it. I would use something that does promise that, which is why I said reads. If it reads you're optimizing for, you don't need something like a transact. Yeah, and I think those who are looking looking into Mongo and want to use Mongo, documentDB is like a perfect API compatible replacement, which is built on Aurora, which is a distributed SQL engine. You get all the benefits at only a few of the costs. A pretty good alternative. As you can see, I'm big on AWS. It's because most of my film that's come on AWS, so I'm biased probably. This one, the question on that is basically, so why didn't you use a queuing system, Rad? Where did you go with Mongo? Actually, I did not design this. I'm redesigning it right now, and I'm using PostgreSQL, and we're using a domain-driven design model there because there's clear separation. We're also running into product issues where we're not able to effectively develop or make product changes because some of those design decisions, in the past, we are rewriting that. We're also, I think, we're doing a case study with AWS as well. We'll be publishing that pretty soon, maybe in a month or two. So, I think I'll just take one last question, and then I have one closing question for all of you. So, this is a question from Dakswar Mahal. What's the best database storage system for storage images? What system are we missing? GridFS, Minio, S3? I'll just say, really, S3. Minio is a pretty viable choice if you're willing to take the ops. The idea is the same thing. S3 or Minio. Should we talk about why? Are we doing all that right? We can move on to other questions, but yeah, basically, you're not operating on it. You're just storing files, so put it in a file storage, you can put them on an object storage like S3. I would say kind of use S3, but actually use some of the database for indexing all of them, and you can query stuff if required. So, but make sure the S3 is the primary source of truth, add all the tags except the images itself so that you can reconstruct the image if required from S3 itself. That's what I would recommend. Yeah, that's my go-to approach. Like, put the image in S3 and put a link in the review. So, I think we're... Yeah, so there's a question. I think this is not sure if you can take it, but it's an interesting question from Chris again. When you decide to share the database and what are the data points you collect before sharing and after sharing to measure the effectiveness, okay? I think we haven't touched upon actually sharing and partitioning. That's something that I want to bring up, but as quickly can you kind of talk about that? Are you talking about node sharing like distributed or are you talking about data sharding? I think it's talking about data sharding. Chris, can you confirm? Yeah, I think let's assume it's data sharding. Right, let's assume data sharding. Yeah. Right, yeah, so it... So, my approach with any transactional database has been that if... Like, I actually use partitioning or sharding as a way of archival in my transactional DB. Like, your transactional DB, you don't want it to grow beyond a certain point, like after a certain point. And at any given point of time, most of the time, you're only working with the most recent or the most part of you, at least part of your data, the most recent part of your data, like say bookings, orders, or whatever, any kind of things, like you're working with the most recent part of your data. So, time-based sharding is the most common kind of sharding that you can do. And Postgres comes with the latest versions of Postgres. It comes with native partitioning support. Before that, what we used to do was we used to use something called a PG department to do trigger-based partitioning. So, now Postgres comes built in with it and what partitioning helps you with is a very easy archival. Like, you keep one week worth of data, you keep X period, like one month, one year, how much ever you want, and the rest you can archive. And with logical replication, your archival becomes, for the transactional DB, your archival becomes even simpler. Like, you only delete data from the main master database and you don't replicate that in the replicas. So, your replicas have much like the whole set of data that you need. So, that has been like, my approach to partitioning has been always like, the moment you've hit production, the moment you know you're gonna hit size scale, think of partitioning, build your scheme, like partition, make it built in from the beginning. Yeah, Swannan, do you have anything to add? Yeah, I mean, I generally tend to prefer quite a bit on logical sharding, which I can control from the application because I come from the heavy schema side. But lately, I've been sort of leaning towards the Witte style automatic sharding, where you have, you use uniformly distributed numbers or uniformly distributed data types to actually shard and then do it transparently, it works great. Slack uses it to great effect. It has some shortcomings. But truly like, I mean, from a first principles approach, sharding we need to use when one machine is not enough, that dv6.r16 machine is not enough and you're running into limits. That's when you're automatically, you must do it. But the second level sharding or the second side to it is the DDT style partitioning. You know, and now I'm going, I'm intentionally size-sharping on sharding and going to partitioning, minor difference there. And if you can preemptively do that, then I like to do that as well. I keep coming back to university scalability law in this context, right? But what is your contention and what is your coherence? How much chatter do you need between the different shards? Do you need any chatter or coherence between the two shards at all? Or is it just a smooth partition? And then in DDT style, you're going to need some coherence. And I sort of approach these decisions from those angles. Like I said, I really do prefer application level logical sharding over automatic, like historically preferred. Sorry, I think I might have misunderstood the question also. Maybe I did. No, I think both were a part of the question. Yeah. Partitioning and sharding. Yeah. Both answered one of each part. Yeah, cool, cool. Yeah, so do you have anything to add to that? Mostly no, I think the other part of the question and I'm looking at it, it says, what data points to collect before sharding and after sharding to measure effectiveness? I think do it, like do your query or whatever that you wanted to do before and after sharding. That's your measure. Like what's the ultimate test? Your application should run fast or your data access or entry should be fast in a certain way. Localizing your data into one particular place, whether it's a partition or a shard is what you're ultimately achieving. Read less, get the exactly the information that you want. So if that's hard to do from one big table, you're putting it up into multiple parts and either a partition or a shard in a different machine or whatever. So you're optimizing for resources. Think about it like that. You want like instead of, okay, for example, geo sharding is a very simple make sense kind of concept. If I have all my customers in India and all my customers in Singapore or something and their data will never intersect and I want each of them to be fast, it's a fairly straightforward idea maybe to put them in different places if it's not performing enough. Yeah, cool. Thanks a lot. So I think I'll close it with one small question. So what is your advice for someone starting now? Should they worry about all these or and what should they keep in mind? Oh my God, I feel like preaching so much. Shameless plug. Come to my, I'm connecting a steam engine workshop. Come to the workshop. Honestly, but yeah, I think first principles thinking again, there's a lot of chatter about first principles. This is right, but with with databases, really this is where first principles thinking really shines that you want to be true to your domain. You want to represent your data as truly as possible. Have it reliable, so reliably, high integrity, etc. And I like to come and approach it from that angle. Is that the question? Am I answering the right question? Yeah, yeah. Okay. I'm saying my question is basically what to keep in mind when you're starting first, right? I mean, you are saying that integrity is the first thing to keep in mind. Yeah. Okay, doing modeling and data integrity. I think those are two that are my favorite. Cool. Anything, Nid, do you want to take on that? Right. Yeah, starting out, like starting out and scaling up are two completely different problems. So starting out is much simpler. Like starting out is follow the, follow the conventions put forth by so many people who have done this before. Like if you're starting out, like starting out, you don't need to go forward your own path. Like you follow the conventions that have been put forward by so many else. Like, and then when you hit a particular unique sort of problems that only few people have found, then you go and forge your own path. So yeah, and just to add to what the reason, one of the reasons that we sort of have been harping on Postgres a little bit is because of the community. The Postgres community is the most active one I have seen in any of the open source libraries and open source software that I've used. And there are people on like, I've asked questions on Postgres IRC and gotten response within seconds and gotten amazing response, like exactly what I wanted to hear, exactly the explanation I wanted to hear. And these people who hang around on IRC are experts in what they're doing. So yeah, there are community to help you out when you're starting out. Cool. Yeah. Steheri, you want to say something about that? Sure. I think I'm a little bit with Swan and on this where I'm not sure what I'm supposed to answer. Give advice if that's the question. I would say try to write down your problem statement very clearly to the extent that you can. If you're starting out, write down what you want. From your database. Write down exactly what you will ask your database. What are the questions you're going to ask your database? So that's queries, right? And then write down what all the information you're going to write into your database. Both these things together will inform your schema design. Like in terms of taking the first little principles and approach, right? I'm just kind of trying to break it down. What is that involved? Write down your problem statement. Break it down like this. Write what you're going to ask it. Write what you're going to put in. And then based on how quick you want an answer for something or how correct you want an answer to be, you can look at different options. If you're starting out and you know very little about database, just pick both of them. It's a very, very safety fault. And it will accommodate the various different kinds of features as well. But otherwise, I mean, the database world is extremely large and wide. It's a huge industry, right? So you have key value stores, graph databases, columnar stores, temporal stores, embedded databases, distributed databases, right? So it's a big world out there. So you have to do your research to figure out what database you'll want to use. Get your schema design, right? Get your domain design, right? Everything else can be kind of dealt with a little door. Thanks a lot. I think with that, I would like to kind of close the session. Thanks, Nid, Sriharian Swannan, for sharing your wisdom. It was an amazing discussion. Thanks everyone for joining. I hope you all enjoyed it as much as I did. See you for the next episode of the Scaling from First Principles. Thanks, everyone. You're welcome. I hope we contributed something more than you's post-credits. This was a nice discussion. I enjoyed it.