 Carnegie Mellon vaccination database talks are made possible by OttoTune. Learn how to automatically optimize your MySQL and post-grace configurations at ottotune.com. And by the Steven Moy Foundation for Keeping It Real, find out how best to keep it real at stevenmoyfoundation.org. Thanks for coming. Today we're excited to have Kyle Bernhardy. He is the CTO and co-founder of HarperDB, which, as I said earlier, is my wife's favorite database startup because they are dog-themed. So Kyle has been involved in product development in the IT sector for several years. And I think HarperDB started, what, in 2017 or earlier? Yeah, four years ago. Okay, four years. Okay, perfect. Okay. So as always, if you have any questions for Kyle as he's giving the talk, please unmute yourself. Say who you are, where you're coming from, and feel free to interrupt at any time. We want this to be a conversation, not just Kyle talking to me, because the void is there the whole time. So Kyle, with that, go for it. Awesome. Thanks, everyone. You know, like I said to Andy, I'm originally from just outside Philly. So I know not everyone's from Pennsylvania, but it's nice to at least virtually connect with people back from my home state. It's just really nice. It's been a while since I've been to Pittsburgh, but I've had Permanney brothers. I've been on the incline, you know, all the touristy stuff. I've been to the Monongahela, all that, but, you know, I went to school at Penn State, so please don't hold that against me too much. So anyway, let's jump into it. So today, what I'm going to talk about is really the journey that HarperDB has been on over the last four years regarding our underlying data storage engine. So really talking about like, what were we trying to accomplish? How did we first take a stab at it? What do we try next? And then where we try next, really talking about even though what perceived failure can be, it's not always a failure. There's always nuggets that can be taken out of it. How do we hold true to our design values while things still felt very uncertain, while we're trying to, you know, get a start up off the ground with marketing and fundraising and all those other things. But the main focus of, you know, our organization is the technology. And so making sure that we're creating something performant and creating like a good experience from a database perspective to our end users. So just a quick outline. Like I said, I'll talk real quickly about what HarperDB is, why we build it. And so really the three phases that we've gone through in our four year lifespan is data storages. We build our own data storage engine just right on the file system. I'll talk a bit about that, why we chose to do it, how we did it. A little bit about our patent pros and cons. After that, we entered into a key value store. And again, just really that same pattern, like why do we do it? What did we learn? How do we move on? Finally, we landed on LMDB, those same steps. And then where do we go from here? And then really kind of wrap it up with all the lessons that we learned. So that's about the same thing I just said. So what is HarperDB? So at a high level, HarperDB, it's a structured object store with SQL capabilities. So we have a storage algorithm that can take in SQL, no SQL operations. And that all distills down into one data model. So we're not doing a table store over here and a document store over here and doing a job in between that synchronizes the data. There are some products that do that. We did not want to do that. We wanted to try and keep it simple. And that's what you'll sort of hear me say multiple times throughout this talk. Also, dynamic schema, meaning that as you're ingesting data, HarperDB automatically adds those new attributes from your object or SQL insert update and also automatically adds the indices as well as while you're doing this data ingestion. It's also natively restful. And also, we're not going to cover this, but we also have a data replication through essentially what you can think of is like a PubSub model via WebSockets. And so you can do deterministic data replication between tables. You can go up and down both and just choose the tables that you want to sync to where. But like I said, the purpose of this talk is to focus on our data storage algorithm. Yeah, so I don't want to spend too much time on the SQL part. But what is your SQL dialect? Is that hung-grown? Are you based on like Postgres or Calcite? What are you guys using? Yeah, so it's obviously we're not supporting full ANSI 99. It's skews closest to SQLite's language, dialect. But we're not doing triggers at this point. We're not doing store procedures. So it's not full ANSI SQL compliant. But are you using the Calcite department? What is the dialect? Or is it your own parser or SQL? No, so we're actually, so we are using a open source library called a la SQL. And so what we do is we ingest, we put the SQL through essentially SQL or a la SQL, it gives us back an abstract syntax tree. And we get that syntax tree back and we parse it so that we know which indices to search, start getting data from. And we do multiple passes. Once we start building out the table structure, we then pass it into a la SQL for it to do more complex like grouping. And so we're sort of doing just the raw index searching, fetching, things like that. And then we pass it into this open source SQL engine that actually does the more complex analysis. Andy, like you and I had talked, we're a pretty small team. And so taking on a data storage engine and a SQL parser, it's a lot, as well as all the other things that we do. So leveraging the Node.js community, which is what HarvardDB is built on, it's really allowed us to jumpstart big parts of our project. And a lot of this stuff. You answered my second question. You picked all the SQL because it was JavaScript. Yes, that was correct. It was part of the NPM community. And so that was part of the iterative process of the very early stage of HarvardDB was evaluating, there's a lot of SQL parsers out there, but which ones are robust? And some of them only give you back an AST. Some of them only give you back, they don't support maybe group buys for some reason. And so we had to sort of just parse through all these things that what these libraries can and can't do, where are the edges? And we tried one that was more like a, I think it was called like SQL like parser. And that's like the reason why I say only gives back partial is because that's what we first landed on. And I actually very early on tried to build my own SQL parser, but things kept falling through the gaps. There's just so many holes. It was really complicated. So we, through exploring and doing more analysis, we landed on all the SQL and it has good community support. We actually did an event a while ago with guys that maintain it. And they're super good guys and it's just nice to, when we get a chance to connect with people from the open source community, it's always just like, I get to say thank you to all these people that have done so much hard work to support us and so many other people. Yeah, that's awesome. Yeah, so why did we start HarborDB? So our genesis before HarborDB, all the people that you see in this picture here in the lower right hand corner, we all started from a marketing automation company. And we did data analysis for sports teams and entertainment. One of the biggest events that we did was do data analysis on every single tweet around the World Cup back in, I don't know, it was a while ago. But it was about 250,000 tweets per second that we did natural language processing on and sentiment analysis. And we were doing these near real time videos based off of, basically, it's like a stadium filling up with soccer balls around whether people like Germany more or England more. And so we were using SAP HANA. And at the time, we are only using HANA. And doing that level of individual transactions, myself and Zach Fowler, one of my other co-founders, we were just sitting there, we were just watching our logs and monitoring the database and just watching it crash every time someone scored a goal and Twitter's freaking out. So we realized pretty quick, we can't just do a relational database for essentially doing these transactional analysis. So we put a NoSQL data store in front of that. We were using DynamoDB because we were on AWS. But because we were doing things for sports and entertainment, sometimes for live TV, we needed to do things in real time. And at the time, AWS didn't have a data sync tool that could sync things fast enough for us. So we built our own ETL tool that pushed data from Dynamo to SAP HANA. And really quickly, just because of the scale of the data we were doing and because people were self-servicing, sometimes our clients would just start monitoring hashtag yes. And all of a sudden, like a million tweets a second were coming in because hashtag yes. And we'd have to ask them to please not do that. But it was this really expensive, really complicated like I say here, Rube Goldberg machine of technologies. The cost was extremely high and we were just a startup. And so a majority of our costs were just going to our data infrastructure. A majority of our technology resources were just going to maintain our data infrastructure and limiting us to be able to build on top of that and add new features to our platform. And we went to AWS and a number of other partners and we were like, we're obviously doing this wrong. And they were like, nope, you're doing it exactly right, but this is exactly what Amazon says they recommend you do is like this sort of no-sequel umbrella, pushing data over to your relational store so you can do deeper analytics. But again, it was like so cumbersome and so expensive. And we were like, there's got to be an easier way. And so myself and Stephen Goldberg and Zachary Fowler, we just one night were so frustrated and we thought about essentially what became HarperDB and like really simplifying the stack, creating a database that can do the transactional rates of a no-sequel database, but be able to execute SQL on that same data set. And that was the dream for us. We had come across products that said they could do it, but ultimately things, at least for us didn't work out with those products. You know, when we sat on this idea for about a year because we're like, well, we use databases a lot. We know a lot about how they use databases, but we're not database people. We've never built a database. And so we sat on it for a year because we thought someone else, if we have this idea, someone else will. And no one did. And so back, Andy, like you said, in 2017, we decided to make the plunge. We got some initial investment. We had about three months wrong way and just dove. And we only ideated on HarperDB. We didn't actually start any development until March 1st of 2017. So that's really when the work really started. So our overall design vision, I've touched on some of this, but really our high level goal was to make a database that was resilient, powerful, minimal need of administration, fun to use. You know, also what we really liked was the flexibility of NoSQL, like I said, and with the power of SQL analytics. So with NoSQL having a reflexive data model that will just respond to the data that you're ingesting. So if I add a new attribute, I don't need to do a create or alter table, add column. It'll just do it for me. And I'll automatically add the index for me in real time because going back to what is HarperDB, it's fully indexed. And so as I'm adding data in real time, it's adding those indices for you. No deadlocks. We ran into deadlocks a ton with our relational database. So we didn't want multiple transactions sort of playing a game of chicken with each other. Also we wanted a database of a stateless. So if it wasn't being asked to do anything, it would just rest. And so not having any right-of-head logs, no merge trees, no background processes. And so because one of the use cases that we were aiming towards was IoT. And so putting HarperDB on something like a Raspberry Pi, we wanted to be cognizant of like memory utilization, battery utilization, things like that. Also, inside that no fixed data typing per attribute, minimal constraints, and really just letting the developer manage the data ingestion and putting less of the onus on a DBA, more of the onus on the developer themselves and giving them more power in their own hands. And the other vision that we had was fail facts, right? Like let's start developing, figuring out what's gonna work, when and things don't work, let's reassess, but hold true to our vision, our mission vision values. That was always key to us as a company. And that was like when we started, that was one of the very first things that we did as a team was to find those. And you can see those on our site. Anyone to be perfectly honest too, like I don't think we've ever really talked about like how we've failed in the past, but one of our core values is transparency. And I think it's valuable in the startup world, like you hear about unicorns, and I'm not saying we're a unicorn, but you hear about success and people companies sort of seem to emerge whole cloth, but that's not the truth. And so really what I wanna do is sort of peel back the curtain a little bit, talk about how we have failed, how we move through quote unquote failure to success. So HarborDB architecture, you can really see this as like a series of layers. The layers that we're really gonna focus on are the bottom three, which is like our core services, our storage algorithm, and where we live right now is our LMDB implementation, but that would really just be LMDB. The first thing we're gonna talk about is our file system, data store implementation. But the way I had always, because I just worked with the database, I always saw it a little bit more like a monolith and working with HarborDB, when we first built it, we kind of built it like a monolith too, like our data storage algorithm was deeply tied into our app logic, and it was all interconnected because that was really our understanding of database or at least mine. And as we've gone through this process, peeling back those layers and really understanding that this is not a monolith, but it's a series of tiers. And the data storage layer is really the foundation and everything is gonna bubble up from there, good and bad. Just a real quick example, HarborDB is easy to work with. These are doing a NoSQL insert node, doing a SQL query in Python. And you can look at our docs if you're ever interested in getting to know more about HarborDB, but again, simplicity is our main goal and so making sure our product is really easy to work with and because it's API first, you can do it in any, interact with it in any language. So just real quick data storage, obviously that's like really the key part to a database. It's like, how is the data organized and structured? You wanna make sure that data comes in and out in a structured way. Obviously there's many ways to do that and we've explored quite a few that I'll talk about. But once the database is picked and underlying storage engine, it can be difficult to migrate to another, but I think a lot of products out there have architected in a way that's made it very easy. Like MySQL has NODB, MyISAM, memory, they've got like 10 different ones. I know MongoDB has a number of different underlying data stores. It's pretty common and very early in our company, we had talked to an investor and we were so coupled to our idea around our data storage algorithm that he had brought this idea up to us. He's like, well, why don't you just swap out? Have you guys ever thought about using another store like Redis or something like that? At the time, we were so tied to our concept of our data storage algorithm that was like anathema to us. It was almost like at the time getting slapped in the face. Looking back now, I'm like, man, that guy, I wish I listened to that guy sooner, but it was also part of our learning experience. So real quick, everyone here knows what a file system is. This is really just setting the ground for where we started with HarvardDB. Obviously folders, files, obviously the underpinnings of a file system too. There's inodes and blocks and there's a lot of metadata in those inodes to the block allocations and you can do hard links to have a representation of a file that sits in another place so that you're not doing data duplication. And I'm just going through this real quick because I know this is a very light touch on file systems, but it's just sort of setting the ground for what do we do? So when we first started, our thinking around file systems was like, in a way, file systems a little bit of a database. They use, some of them use trees underneath like EXP4 uses an H tree so it can avoid hash collisions and it does some auto balancing of the contents of a directory, you can only have one writer per file. So we were like, what if we organized a file system in a way that was logical and we could just store our data directly into the file system under folders and files and we can just access it that way. It's like you can sort of essentially human read it, you can look at it, see what it is and go from there. I'm also part of like what we were thinking about when we started HarborDB was that SQL, no SQL idea. How do we ingest documents but be able to do relational queries between two separate tables? And so our thinking was like, let's blow it all apart, right? So if we have everything blown apart, we can, we got to put the record together. We could also put multiple records together between tables and what we were trying to do was avoid doing large scale scans of like a primary index like that. We didn't wanna just like iterate every document to find an ID that matches another document over in a completely separate collection. So we were pretty naive when we approached this but we also felt like we had something really novel. We also got a patent on it. So the patent agency also agreed with us from that. So we felt like we had something really innovative. It also helped us go off the ground but we ran into some technical difficulties along the way. I'm sure many people here hearing me say this are probably cringing and anticipating the troubles that we ran into. Yeah, like to be very clear here, you're taking a single document and then you just split it up in the keys and store those keys in separate files. Yes. That's what we started with. It's not a column store, like take an attribute that everyone has, store all the values together. Like literally you're just riddling your I know tree with all these like individual files. Oh yeah. All right. Okay, all right. Did I make it hard to think? No, like, like, I don't recall. I think they're cringe. This is like, he's like, okay, keep going, keep going. We really did this, Andy. Okay. I'm sorry, this is gonna be the example. Like whatever you got to say, I'm gonna use the example of my class of like what not to do. Perfect. Keep going. Okay, keep going. So, you know, obviously some of the challenges around this, there's no NBCC, you know, acidity was a huge problem with something like this, you know, cause we weren't implementing any type of file locking outside of what, cause we were like optimistically and naively relying on the file system and not really thinking through, we were really focused on doing at that point in time because, you know, we had limited funds. We were also trying to, like I said, fail fast, get out to market, see how things, see what stuck, what was important to people, you know, what connected, but there's also some gaps in all this too. You know, so at the time, we recommended EXT4. We were, we still are Linux based only. We now have the capability to go Windows, but we've not made that effort yet. You know, but with EXT4, we also, you know, started like more and more realizing like these iNode limitations. And so we just kept piling on more and more config, which really started going against like our design values, like, hey, let's do minimal config, but we're ending up in this place of like, for your file system, you don't have to do a lot of config for data on scale. So real quick on our patent, you know, so our operational model, which is still true today is, you have a JSON operation that comes in or SQL operation that comes in, and we have two different interpreters. That SQL interpreter really just sort of, once we get the AST, it really just then ultimately forks over to the JSON interpreter. The JSON interpreter is really sort of that, or the operation is really the true operation. And so from SQL, we just do some interpolation for simple SQL operations for more complex ones. We have a whole series of functions that we execute because especially SQL selects can get very, very complicated, but we still use the core underlying search functions that are no SQL operations use. It's just using potentially many of them in tandem to get the results set that you want. But you know, it all, like I said, resolves down to the same data model. So this is again from our patent. And I think like one lesson for everyone too is patents are exhausting and a little boring, but. Vincent too, as a startup, like as an early funded startup, this is what, six figures to do this, right? With the lawyers? No, it was not six figures. We used to, our COO could, he could text you directly, but it was not six figures, but we had a fantastic, Ella Boulder, we have a fantastic patent attorney and he really helped guide us through this. And we also did a lot of the work. So I think if we were more napkin scratching, but we did a lot of the work ourselves, and then he like cleaned it up and formatted it for the patent office. So this really, even still the left-hand diagram is still true. It's just the right-hand diagram is no longer true, because the right-hand diagram really starts delving into the explosion of data, but it doesn't really give the full picture until we get here. And so, Andy, this is where I think the visualization of like the future example you're gonna give to your students. So, HarvardDB has the notion of schemas, and that's really just the namespace. And then underneath the schema is a table, also like really just a namespace that you then start collecting attributes under. So for file system, dev, dog, those are all folders. The HDB hash, ID, name, breed, those are all folders too. And so what we did under the HDB hash folder, we then created more folders. So this is HDB hash held the true representation of the data. That's where the raw data lived. Whatever you define as your primary key in this case, obviously ID, that's where the, I'm trying to remember back to what we did. Oh, so here was, because we never at the time held a full representation of the row or object. So this, I believe under the hash was actually, we did essentially like a transaction log underneath there, but the ID is implied through the name of the file. And then under anything else, like name, breed, these would all be like one HDB. And then like in there, it would be, penny, Cato, and the breed, it could be whip it. And then, so this is where like, if you're doing a search by like a get by ID, you could just go straight in here, everything logically is there. You could get, I'm gonna get by ID too. So I'm gonna get, you know, go in the name, get to that HDB, breed to that HDB. Even if you're doing multiple IDs, you could do that pretty quickly. Searches started getting more problematic, especially as the number of elements started increasing under a folder. So if we're looking at the other top-level attributes, like ID, name, and breed, you could say like Cato could have like a million entries in it. And so then you're doing a read dur on that. And so the problem with, and read dur is what Node.js uses, I think it's read dur, breed. And so the problem with that is it doesn't just call get dense, it also stats every single file. And so the memory overhead is huge. Sometimes you'd make a request and it would just, we didn't know where it went. It would just take forever. Putting, you know, all the pieces back together. There's so many operations happening all at once. Just on scale of data, this all just started falling in on itself. Wait, plus all the locks, you're taking on the IDOD3 inside EXC4, right? Like you have no control of that. No, no, and so yeah, right. Great point, Andy, is we had, while we thought we were in control, we realized more and more, we have less and less control here. And file systems were not built for what we're doing, because like these files, as you can tell, are super small, right? Like most of the data that were, this isn't like large blobs, obviously not dealing with video files. These are really, really small. And so that's where kind of harkening back to what I said before, the config had to get more and more intense on the file system, but there was only so far we could go. This is a little bit of a higher level view, less detailed than the previous one, but still, you know, trying to demonstrate at a higher level what we were trying to achieve from an indexing perspective. So anyway, we, after, you know, a period of time, we realized this is not working. You know, the pros were, you know, enable us to go off the ground, you know, got us kind of out of, you know, a standard form of, you know, got us away from standard paradigms. We were trying to do something new. You know, we had minimized data duplication because we were using hard links, not symbolic links, we were using hard links. You know, the storage algorithm was patented and unique. And we thought that was what made us special, but when we, like, step back, the people that were using HarperDB, what they really liked was the ease of use, the SQL NoSQL, they didn't really care about the way we stored the data, except for the fact that iNode limitations, we were taking up way too much for the, like, 10 megabytes of data, could be taking up, you know, over 10x that on disk because we're over utilizing block space. You know, like, you know, and so to configure this just to touch base on that, you know, when you were mounting your drive, you had to pre-define how many iNodes you were gonna set up. And once you have that set, you can't change that. So you need to know ahead of time, like do some really complex data estimation off of what you're gonna do. We also wanted people to do further config around shrinking down the block size so we could get down to 1024. We also did some other things where, you know, you could store data on the iNode, you could define how much data was stored on the iNode. This is, we got deep into some RSS feeds with the creator of EXT4, where there was like some esoteric setting in EXT4. Oh, but we had to also do a special kernel on EXT4 in order to enable that. But you can store data on an iNode. You can say how much you want to store. So like to minimize the blocks, but you still heavily reliant ultimately on iNodes. So, you know, this did not work. This is the craziest thing I've ever heard. I'd be honest, like, I thought the crazy thing was, the fact that you had to roll your own kernel just to be able to do this iNode store, isn't it? People are trying to build their own file systems in the 80s and they gave up on it, right? For databases, nobody does it anymore. Yep. As you can see, we were at this impasse. Like what do we do, right? Like we invested all this time and money and energy into this data storage algorithm. It was really like, that's the linchpin of our database. So we were like, do we write our own file system to your point, Andy? Do we just give up? Or do we try to find plan C? We weren't going to give up. And so around that same time, we had a previous relationship with a company. I'm not gonna say their name. Things didn't work out. And I don't wanna say who they are just because things didn't work out not because of their product, not because of who they are, their product. It just was a not a good, like we were not a good fit together. But I do, it was an important part of our path. And so we were approached by this company that we had a previous relationship with. And they're like, hey, we have this proprietary high end key value store, it's used in FinTech. It is like, you know, it saturates the disk. It's awesome, it's got great performance to show us all these benchmarks, scale linearly with cores on reads and writes. And we were like, awesome. And we were also at the time kind of looking for someone to throw us a life vest too. Just be like, how do we get out of where we are? Like still hold on to our, like our vision and values here, but like deliver performance to people that the way we say we're gonna give it to them. So I think everyone here knows what key value store is. It's pretty simple and basic. You know, it has some basic functionality. You can do GIFs, you can search by range. You can store, you know, strings, numbers, arrays, you know, a lot of different data types in it. But if you wanna do anything more complex, you know, there has to be higher level implementation on top of a basic key value store. So pros, it's fast, not a lot, not very complex. It's pretty flexible. Typically they scale pretty quickly. You know, they provide, you know, iterator functions so you can, you know, do ranges, you can do direct seeks. Sometimes you can do reversals, things like that. And then, you know, the way you implement it allows you to do more complex things on top of that. You know, but like I said, functionality on its own is limited. You know, and if you wanna do relational structures, like you would have to figure that out on your own. So like we wanna do SQL. So that would also be our implementation on top of that. I mean, that's regardless of store, that's always a given for key value stores. So, you know, our first pass with this was, you know, we were able to ideate out like how to migrate our data model over to a key value store while still retaining like the fully index model. You know, so it was a good thought experiment. And even though this key value store didn't work out for us, we still worked out how do we model our data? How do we do what we wanna do in a different way? And then also we realized, you know, we need to get off of our existing data storage, but we also linked our data storage logic into our application logic. And so we need to break that out. So we created what's called, we called the HarperDB bridge. And so what that allows us to do is deterministically state be like, I wanna use file system. I wanna use key value store. I wanna use some other thing. And so it will then dynamically load in the correct modules for you based off of what your file system is. And you know, the key thing is making sure that there's parity in functions. You know, that's from an engineering perspective. We have to make sure that we have parity functions across all of the different data store technologies that are implemented. So while this key value store didn't work out, we did get the bridge out of this. You know, some other things that we learned through this too is because we're Node.js, in order to increase concurrency and parallelization, we are multi-processed, not multi-threaded. Node.js now does support threading with worker threads, but when we started four years ago, that wasn't even in like beta with Node.js. So we're like embarrassingly parallel as how we operate. And one of the challenges that we had was this key value store was supported multi-threading, but not multi-processed. So in order to get around that, they ran their product inside an HTTP server that's side card with our database. And so now you have two HTTP servers running on the same node, but that data storage HTTP server is really just making local, we're just making local host calls to it. And one of the real big issues was to iterate. The way that worked was if you're going through a cursor, it was a separate HTTP call to get each item through that cursor. And so in a lot of ways, most workloads were worse than our file system performance. So, again, going back to like, what are we and what do we need to do? Nothing wrong with this product, but it just didn't work for us in what our paradigm is. So what we learned, key value stores on their own, they are fast, they are simple. Obviously, all the iNode concerns go away because typically you would have like one table be one, I'm gonna use an LMDB term, which is an environment, but typically you can have a table be self-contained and a binary file. We also learned that using a key value store was the right path, but this wasn't the right, the technology that we tried was not the right thing. And while we lost a significant amount of money through this endeavor, we did have some development benefits, which was the bridge. And also we learned like we were a little desperate. We dove in too quickly, we didn't trust but verify, we just more just trusted. And I did some evaluation, but I didn't dig in deep enough to say, okay, this is gonna work, this is not gonna work, let's just pass and find something along these lines, but just not this. One of the other things too is like this was a, not an open source project that we were partnering with, they were also a startup. So in order to implement this, we were gonna have to do a rev share as well. So the plan at the time was we would still sell our file system, that would be sort of HarperDB basic. And then if someone wanted to use this key value store, there would be like HarperDB plus that would be then sold on markup. And the assumption was we'll get better performance so people would want to pay more, but we couldn't justify that once we actually started doing benchmarks and just realized that there was no way to overcome the challenges that we had. So we then moved on. So we are now about a year ago, just maybe a little bit over a year and a half ago is where we were. So we learned, we needed to do a lot of testing. And so we did a big bake-off. We spent a couple months evaluating multiple key value stores. A lot of them fell out really quick because again, we're Node.js, built on Node.js framework. So if it didn't have a distro for Node, we couldn't use it. And so a lot of them fell out, like RocksDB was a good example. Like I was really interested in Rocks. And there was really only one implementation, but it was really kind of half baked and the guy hadn't worked on it in about a year and a half. So that was really a no-go. So when you say distro, you mean like it has to be natively in JavaScript or for Node.js or like there's bindings for it in Node.js? Yeah, binding for it. Sorry, yeah, like a binding for it. So essentially a wrapper. So with Node, you can call, they're called native Node modules. And so they do these like wrappers around like a C or C++ library. Cause everything under Node.js, everything in there is C and C++. They just have these native Node bindings around it, but everything like sort of core underneath are all C and C++ implementations. So like I said, we did like a big big off with these. And I had actually been researching key value stores for a long time. And obviously level DB came up a lot. Rocks DB came up a lot. Orien DB came up. And like level DB wasn't gonna work because of multi-process concurrency. There was, if I remember properly, you could do it, but I think it was only for multi-reader. There was like some weird way that you couldn't have multiple writers and or multiple readers. I can't really remember right now. And it's single-threaded NMAP with a single writer. Okay. Yeah, okay. But it just felt very limited. And then sort of towards the end of the bake-off, I must have just done a different Google search or something, but all of a sudden LMDB popped up for me. And at first I didn't really, I kind of discounted it, but the more I dug into it, the more intrigued I was. And the more I dug into it, the more I realized like that this technology really fits with what we're trying to do with being stateless, with being flexible and dynamic, and also being multi-processed and having a concern towards resources. So what I did for POC was, I didn't wire it into HarperDB. I actually built sort of a model of what HarperDB would look like. And then I did side-by-side workload comparisons. And some of these tests were obviously optimistic because there's layers in HarperDB, like security and role-level access, things like that, that we have that wouldn't be in this POC, but even still with doing bulk loads, we were getting 600X performance, queries were on average, 60X faster. It was pretty much a no-brainer. And we were not seeing this in some of the other key value stores that we were evaluating. And so this was the obvious winner. Also part of this POC process, because I was building a POC that was emulating HarperDB, it allowed me to start finding that path for true implementation. So how am I gonna implement the low-level environment and sub-database management? How am I gonna do this? Because I needed to do it to some degree while I was creating the POC. I still wanted to do some dynamic attribute creation, create dynamic indices while I'm doing this POC. What are the impacts on that? What are the impacts on scale? How many environments can I create? How many sub-databases can I put inside an environment? How can I break this thing that was really part of the process? Was not just how can I succeed, but also how can I blow this thing apart? And what I found was LMDB is super resilient. It is natively acid compliant. I can get into some of the pros here. Also, if anyone's really interested in what LMDB is, Howard Chu, who is the CTO of Simonus and created LMDB, he did a talk October 8th, 2015 with Andy as part of his embedded database series. And so listen to his talk. You can also hear some intro violin music and really get into the guts of what is LMDB? What is the dark side of LMDB? How does it work? And also him getting into why he did what he did the genesis of that project as well. It's a really lightweight key value store. It's 32 kilobytes in size. And so for us, because it's an embedded database, we could call the API as functions to do our data operation. So I can initialize a cursor, iterate the cursor, close the cursor, but I don't have to call a separate server to execute these operations. It's not a black box. Everything is fully exposed, so I know what's going on. I also can completely trip myself up, but it's well documented. There's a lot of resources out there in order to figure these things out. It uses memory mapping, so it's using byte addressable space. So pointing to like, so if you've read the record, you can then, then it goes into byte addressable space and then it's like directly accessed. Those blocks are directly accessed and accessible, which allows it to be multi-process access, not just multi-threaded, natively acid compliant, support sub databases that can be added on the fly. So we didn't have to do a restart after adding a new attribute. It's built on an append only B plus tree. So it's really highly optimized for reads. It was built for the open LDAP project, which is heavy reads, but it's also very performant, has transactional support. So LNDV has one writer, but multiple readers. And none, no, the writer does not block the readers and vice versa, so it has full MDs. It's been around since 2011, so it's been pretty heavily did by Howard Chu and company, but also by the community at large. And one of the other key things for us too was a Dupes sorted keys. Can you go back to the previous slide? You said this place now runs stateless, but it needs, I think, oxymoronic. A database is the date, can it ever be stateless? So what I mean stateless, sorry. Yeah, I'm not talking about persistence. I'm talking more about compute running in the background. That's what I mean. So it's probably more of a misnomer on my part. It's embedded, it's embedded database. Yeah, it's embedded. So when I say, yeah, yeah. So from a persistent standpoint, it absolutely has state and it has to have state. So we're not doing an in-memory database that when you restart HarvardDB, everything's flushed because that's part of a transact. It is durable to disk on act. So I apologize. I need to change that stateless part. Thank you for pointing that out. You're welcome. I was just curious. Yeah. Yeah, and Dupes sorted keys were also really critical for us as well. So if I have multiple entries in a secondary index, I want to know everyone that's named Kyle. I can have multiple Kyle values because this is the way we're indexing here and I can get to this in a little bit is we're doing it more like row level or row indexing. So like each index would have the value, the value would be the key and the value is the primary index. So like for Kyle ID one in my name index, Kyle is the key and one is the value of Kyle. And because they do Dupes sorted keys, I can do multiple entries of Kyle. So I can do, I can say, give me everyone whose name is Kyle and I can get 1,000 entries very easily. So that was also not something that every key value store has but that was also really critical for our data modeling in LMDB. So some gotchas of LMDB, when you define your environment, the memory map file, you have to define the max size. So when you define the max size, it's not going to take up all that space. It's just how big that map can get. I think the default's like a meg. I think we default ours to like 100 gigabytes. It obviously is not going to take up that much space. It's just something you need to be aware of and you can hit that limit and also call a resize command in real time but it's just a gotcha. It's something you just need to be aware of. Your names of your sub databases are not internally stored. So as I'm adding extra indices, the environment itself is not tracking the names of those extra sub databases or DBIs. So typically what people do is create their own internal DBI on the environment to track your other DBIs in there. So it's essentially self-referential. Long lived or never closed readers because this is an append only B plus tree. If you don't close a reader or if you have a reader that's open for a long time, the database footprint can grow dramatically as you're transacting more and more data because you have another view of the data that's open. So the B plus tree is just adding on to accommodate that reader that's open making sure that its view is not being corrupted by new data that's being appended to the tree. We ran into that a while back with a customer that was doing high rate transactions and we had forgot to close one reader and they reached out to us and were like, our environment is just growing and growing and blowing up. And so we just had to trace that down. It was just like one little spot where we weren't closing a reader out. Also, as you delete entries, LMDB won't compact. The assumption is that you delete a record you're gonna add entries back. So once it is pruned an entry, it's just is marked for later reuse. The only way to do a compaction is to do a backup of an environment. I think you can pass it a compaction flag but that's not usable in like real time transaction. Our noted LMDB implementation only supported 32 bit integers. So we had to get around that by just storing numbers as binaries. Also, because there's only one writer, LMDB does maintain a mutex to manage all other pending writers. But for high scale transactions you wanna keep your write transactions open for as small amount of time as possible just something to be aware of. When you create an environment you also predefined the max number of sub databases and readers. So it's also something to be aware of when you're setting things up and the settings that you use to create a DBI you also need to use them when you reopen them. So if you're defining that the key of a DBI is a binary you try and open it again as a string you can get some really unexpected results from that. So as we learned more from our previous experience we learned that we wanted to implement a more modular design using the HarperDB bridge methodology. So what that meant was like creating low level utilities to manage the environments, our DBI's and transactions metal then bubbles up into our schema management. So when we create a table we can call those low level utilities to create an environment and then create the sub databases inside that. And also managing the transaction curses just creating utilities to make our life a little bit easier for code reuse. So the way we implemented LMDB is each table is one LMDB environment. And there's really, you can create as many environments as you want each environment is a separate memory map. This implementation search was the most complicated to implement just because there's so many different ways of searching but LMDB is super flexible. So doing equals, you can do when you call an equals command it's not just a straight equals you can do like a partial search on the front end of the key. So I can do essentially like an equals KY to find everyone like Kyle or Kylene or Kyler, anything like that. Or I can also do an explicit Kyle and it's just the logic that you put into the evaluation of the results that you get back. So pretty easy to do an equals it starts with ends with and contains those for us had to be full index scans but the range searching of LMDB was really robust. So it was easy to do greater than equal less than equal between and we could also do reverse traversal of our cursor with a cursor The really nice thing with LMDB is supporting transactions is if something failed and I called it rollback here but it's really just not doing a commit. So if something failed while I'm doing a put or if I'm validating something while I'm doing puts evaluating the data I can just not commit the puts and just close out the transaction the readers are never impacted by anything that I've done a put on until I actually execute the commit but even then the readers that are currently open will never see that data that has been put because they have their own view of the data. One of the things that took the longest was building unit tests on top of that. And so as developers unit tests no one likes to do them but we found a lot of edge cases and potential bugs through building really robust unit tests on top of these modular functions these modular functions that we built. So I know it's never fun but I would probably state that about 50% of the development effort for implementing LMDB was the actual unit tests and then actually like discovering bugs because we're testing all these cases doing asserts on what's actually happening inside the functions finding things that what we expected to happen weren't then going in and reinvesting reinvestigating the code and then fixing things. You know, the other thing too is LMDB really scales well with the parallel process you know, HarperDB's parallel processes and read scale linear linearly. So, you know, I think you everyone here has a good gist of what we gained out of this. We, you know, significant performance improvement. We are not abusing the disk anymore. We have native acid compliance. Really, we get to do the things that we're good at as a development team and following our vision and we chose a data storage engine that really fit with our methodology and the real key thing for all this was understanding our vision understanding what we were trying to achieve understanding who we were trying to serve sticking with that and that really guided us through this process even when things got hard, things got scary we're like maybe we just pull the big switch and give up I don't know. You know, we were able to hang in there because we believe that we could find something that accommodated our needs. And like I said, LMDB really fit well with what we want to do. You know, the cons of this is we are no longer fully following our pattern we're not doing the excluded data model anymore. Our indexing is now more similar to what you would expect in a database. Like, Andy, I think this is probably a bit more serene makes your head feel a little bit less crazy. Sorry, I'm just saying I'm trying to get this thing to calm down. You don't need it. No worries. That's better, yeah. So this is, you know, what you would be more knowable from an indexing model. So for us, you know, schema is still really just a namespace for us. It's a folder, like we still will store tier things in folders. So a schema is a folder, a table. We will create a container folder for these for an environment. And just really just for structure on the file system we could have thrown everything into the same folder but we just wanted to try and create just a little structure to the way we were storing the LMDB environments. So where do we go from here? We are in process. So we released LMDB, HarborDB with LMDB April, May of last year. And so we got great performance. We got, you know, really great feedback very shortly after we released HarborDB with LMDB we entered into a really high scale proof of concept with one of the largest gaming companies in the world that's been moving along very well. And that's really like being powered by LMDB has allowed us to push the envelope and be able to achieve the things that this customer needs. Which is super exciting. And then other opportunities have started unlocking because we, you know, early stage HarborDB people were really intrigued by our flexible model but once they got into the nuts and bolts they're like, well, performances were disappointed because we like what you're doing but the performance isn't there. This just feels untenable. So now we're at a place where we're providing like performance either expected or beyond what people would expect. But, you know, the way we're indexing currently it's a lexicographic storage of keys which makes range searching problematic because, you know, if numbers are stored lexicographically you got to do a full index scan in order to make sure you have all the numbers. So range searches have been problematic for our customers. So with our upcoming release in the next month or so we're implementing a higher order library that's actually built on no Dell and DB it's actually built by one of the maintainers of no Dell and DB it's called LMDB store. And one of the things that they've implemented is binary keys which preserves the sort order of the data that you're ingesting for the keys and the values. So while we're allowing for multi data type ingest on any key we're not currently putting any constraints on attributes even still the sort order would be in the order that you would expect. So, you know, range searches become a lot more performant also with this implementation objects are stored natively rather than stringifying an object. So that allows for faster marshaling of results. We can now start doing encryption and compression. So with the most recent release of LMDB you can supply custom encryption functions. Like there's not a predefined encryption you just supply the encryption mechanism as a callback and you get to do encryption at rest however you choose. We plan on extending transaction support. So rather than just doing like right now our transaction support is either an individual write or bulk writes but really it's either all inserts to a single table or all updates or all deletes or upserts. So being able to enable reads plus writes plus deletes and if any of those fail bomb out of the transaction. This also allows us to do predefined data typing going back to the binary keys. One of the other things that we're actually releasing with our upcoming 3.0 is micro batching because you have that single writer. And so what we're implementing, you know and so on scale of data there ends up being a race for who gets the lock for the rights and what we noticed is on really large scale of rights. A lot of the time that the CPUs spend is like trying to access the mutex and trying to like that actually like when we were profiling our processes like for high scale writes a lot of this is around like the mutex access. And so what we're doing is micro batching and setting up writes to be asynchronous. And so every couple of microseconds like the writes will batch up and since you can do nested transactions in LMDB they all execute as one transaction because batched writes execute significantly faster in LMDB than individual transactions. So sort of this like microsecond level or nanosecond level we're doing micro batching and since each individual nested transaction has a promise delivered to it we can, you know one of the, because they're all separate one of those can fail the others can succeed and we can also act back accordingly to the requester that was trying to execute the operation so that will also increase our write performance significantly and one of the other really Oh sorry. It's very clear here, very clear like non-blocking writes and by asynchronous you don't mean like the write shows up you queue it up in the micro batch and before it's actually committed you tell the outside world. No, no we wait. We wait for the results. Yeah. So we never That's the longer way. Okay. Yeah, we don't do that. Yeah. So, yeah that's like prone for, yeah that sets up bad expectations for the user. So, you know, we're not doing like like I had said at the beginning of the talk we're not doing transaction logs or anything like that. Well, we do have an audit log but it's not the same thing it's not used for the same purpose. That's just used for, you know, HIPAA compliance not for like, you know doing like an LSM tree or something like that or write a head log. We're not doing any of that. So one of the other very cool things that we're doing is in the near future we'll also be releasing what we're calling HarborDB functions. And so all of these core functions that we've built around LMDB we're actually gonna expose as like an internal SDK to HarborDB and people can build their own APIs inside the HarborDB framework. And so utilizing, you know, HarborDB is a like general purpose database but if you put these separate operations together you can create like a much more performant API than just using the general purpose search functions. Like, you know, if you wanna you need to do multiple fetches you know, you need to find all of your friends and then am I friends with their friend in order to return the right results? There's like nested iterations that you would wanna do in that that would be a very complicated SQL query or you can put our core functions together and your own custom function that is then run in HarborDB itself and accessible, you know, as a restful API. We feel like that's gonna extend some really awesome power and functionality to our user base. Okay, we're a little over time. Yeah, I'm pretty much done. So this is actually my last slide. So I think everyone here gets what I'm getting at, you know, what we learned is like, you know, fail fast, fail often, don't be afraid of like quote unquote failure. There's always seeds of success inside there. What really helped us through this defining our mission vision values up front. They were huge guides for us as throughout this whole process and still are like when we encounter things that we don't know how to do we can always revert back to our mission vision values and they help us with all manner of decisions. Plan ahead, do a POC prototype figure out what you wanna do there, you know, trust and verify. You know, we learned a lot about business relationships and partnerships and, you know, most importantly, use the right technology for the use case. Oh, I will applaud. Thank you. For having everyone else. And then, okay, hopefully you won't watch my video. The baby likes noise, punk music, still way to calm it down. So that's what's going on behind me. Okay, two, because over the floor, we have time for one question from the audience if you wanna ask Kyla everything before we call it a day. Okay, so my last question then. So from your original stack architecture, it looked like that HarborDB is a single node system, like every database is running on a single node. Is that the right way to understand it? Each instance of HarborDB, yes. And so is running on an individual node, you can do data replication by essentially PubSub. But it's not, that's different than shouting, like it's like the not a single module data, okay. Yeah, and so. Oh, sorry, go ahead, Andy. I was gonna say, like not, I realized you just pushed the LMDB as though like that's, that was a big architecture change for you guys. But like, you still are coupling the storage. Sorry, you're still coupling the storage with the compute. Right, like, so you can't scale out the compute, right, or the full time stuff. Yeah, yeah, so your question was around like horizontal scale? Yeah, so hard about scaling, it's like, I think we're not doing that. So horizontal scaling would be like the compute and storage separately. Yeah, so that is like the next things that we are gonna be tackling. So we have the data replication piece figured out. And so building on top of that data replication, the next piece to that is doing distributed querying. So part of this is like making sure we have good foundational elements to our product. So if the storage was sort of rotten, doing distributed querying really wasn't gonna solve much, like we could have implemented it as sort of a stopgap, but it, you know, we could have stayed with file system. That was a conversation that we had, but like ultimately there's still a core problem with our storage. So we really wanted to solve that storage problem before we started addressing, you know, how do we distribute across the network? You know, in like query across nodes, doing RAF consensus, and then ultimately doing sharding, like you're talking about, like that is all on our roadmap, but you know, the big parts of this were like making sure that we had a strong foundation before we started addressing any of those other pieces. Okay, okay, awesome. Kyle, we're gonna call it a day. Thank you so much for doing this. Really appreciate it. It was very interesting.