 Okay. Yeah, thanks everyone for coming. My name is Grisha Kruglov. I work for Mozilla as an engineer and I'm here as a poor conduit for Emily's slides. She could make it at the last moment. So I'm here to talk about Project Mentat, which is an embedded Rust store that we're using at Mozilla to solve some of our problems. And I'll talk about what those problems are, how we're trying to solve them, and a bit about the team that's working on that topic. So we're part of the browser architecture user data team and that's Emily Tupnik Alexander myself and in the past Richard Newman, who's with us in spirit and in the front row. And yeah, and so the team's goals are to kind of figure out what changes we need to make now to solve our three to five year goals. And for Mozilla and for Firefox, those are expanding Firefox and so like making the Firefox better, creating user experiences and also going beyond Firefox, building new applications to solve particular user needs. And a lot of the stuff, like there's a lot of hurdles involved, but one of the big things is user data. That's our big mode essentially, is we have a lot of valuable data and people would like to access it in different types of applications and our apps would like to use them. And currently that's problematic. We have, for a variety of reasons that I'll kind of go through. So one of the problems is storing the data. So we're storing, like something like a browser stores a lot of, you'll be surprised how many different things it stores. Some of it is about user behavior, some of it is about just like internals of the browser and a lot of it. And so there are a lot of different teams involved in storing this stuff. And the teams make their own decisions about data storage layers. And oftentimes, those decisions are driven more by immediate needs. Like how do we make this thing work for this UI? How do we make it? How do we query it quickly? How do we ship it tomorrow? Et cetera, et cetera. And then since a lot of different teams are making the decisions, they're not necessarily, well, so they're making the decisions and the outcomes vary since you just have a lot of people and not everyone talks to each other. And like there's some of it is shared expertise and some of it is sort of like folk knowledge. And teams might not be thinking about needs beyond their immediate needs. So like if today and like in a month you don't need to share some data in between different clients, you might not like bake that into your initial data schema. And then you'll realize that when you will need that eventually it's going to be problematic to evolve. Right. And so the decisions that people end up making around data storage needs, well, often they start off as like, well, let's just use something simple, maybe like a JSON store or something. And that doesn't tend to scale all that well. It might solve your really immediate needs, but it doesn't necessarily solve future needs. And building fast, concurrent, safe access to a JSON store that will happen across many different components is going to be troublesome. And you're going to end up essentially solving a lot of hard problems many, many times because you have many different data stores. And then if you do pick like a more comprehensive solution to begin with, you'll, well, you will, sorry, I don't know my slides here, but you'll end up with a lot of, with a host of other problems. So like if you, so for example, like if you picked a SQLite store as your storage model, you'll still need to handle a lot of migrations. And as your project evolves, as your data needs evolve, that tends to become quite problematic as well. And people don't necessarily think about, think in the correct ways about those problems because it's a fairly specialized skill doing this stuff, like building data stores that will sync and it will wolf well over time. And another, so that kind of ties into the second point is sharing good data. And by sharing, we mean sharing amongst, well, like in Firefox, we mean specifically sharing across different instances of Firefox or in the future between different products that use Firefox data. So we have strong privacy and that's part of, like a big part of that is end to end encryption, which means that clients have to do all of the conflict resolution work. Server cannot be there to, like server is just a dump, a blob store, more or less. And we have multiple clients and the set of clients is growing and would like it to grow, right? That's part of our three to five year goals is that building more products that use this data. And then each client is, well, they have their own data storage implementations are written in different languages, we have JavaScript, Java, Swift, et cetera, implementations. And the clients tend to build data, like the teams that are building this stuff, they tend to build data that's oriented around their immediate needs, so like querying the stuff, which isn't, the schemas you'll pick for that are not necessarily the schemas you'll take if you want to optimize for syncing. And so you run into problems like you can't do three way mergers of data because you don't have historical view into how some tables changed. And yeah, so, well, here, so let's, let's, I guess, let's try to look at a concrete example of what that actually looks like in practice. Like here's a simple, like a really simplistic way to model a password storage layer. So like you have two clients, we have a server in the middle, which it just acts as a dump blob. And the schema is very basic. There's a URL for that URL. We have username and password and there's a timestamp. So there's some immediate problems as you start making changes here. So say imagine client one makes a change to a URL. Maybe they've observed now that, oh, I think there's a change. So they've observed that the URL changed and they record one that the change happened. And client two makes a change to the password. So you'll see the password is now different. And there's an animation. There you go. Yeah. So like client one sinks, the data is now on the server. And client two will sink eventually. And the data will now slowly, yeah, you'll notice that like we essentially get a blob, JSON blob back, right? And now at this point, we have to make a decision. Like we have two JSON blobs and we have to somehow smoosh them together, right? We see that the URL changed, the password changed, and this information is on the server already. So other clients might have seen it. So we don't have a ton of choice here. We have to take what we see, which is usually what happens in like simplistic kind of sinking scheme flows like this, right? And so we end up on the client two, we've just lost the password. So like this is a lossy process. We lost the password changed. And we end up like with a state that we've never really observed. And client two has really no way to recover from this. And so their ways to kind of fix that problem in an existing world, but they tend to involve a lot of work essentially, right? Like because we could track more states on each client. But as I mentioned, we have many clients and they all have different implementations of stores. And it involves writing a lot of complicated code many times in different languages and trying to coordinate across teams to do that. But there's this concept called, but like one of the lessons that people tend to learn as they write this stuff is it helps to separate the way you query stuff from the way you modify your data. And that's, there's this idea of CQRS that you essentially segregate, separate query from command, right? So like you separate the way you modify data from the way you query it. And that lets you address individual like needs specific to each side of that equation in a more coherent way. And be very explicit about that separation. So for example, we'd like fast querying, but we'll also like data that is syncable. And those two are kind of at odds with each other. And this kind of approach making this explicit lets us actually address, like make some progress with the problem. And so, yeah, so we get to trying to, and so we get to trying to build Mentat and fix those problems that we've observed. And we, the teams made, they went through kind of this regular buyer build decision. Like we're like, we're looking for something that solves the problems. Like maybe has a, maybe is shaped in the CQRS style, maybe is event or like log shaped has allows us to define strong schemas explicitly, because we that really helps with syncing data. And there's always a scheme in your data, even if it's implicit, except you're just gonna like it's just a full gun waiting to fire if it's not explicit. And yeah, and then so we went through like this, you know, kind of a set of regular suspects. And like the point is that something like we always have run into missing features, right? Like it's not embeddable, or there's not full text indexing, or we can't you know, use it in many different platforms with a single implementation. And so ideologically in a, we've kind of came across the atomic, which is a database that exists in the closure world. And it's transactional. It has few main data models in the sense that they're like, they're we have a strongly typed schema with us and we can mutate it for time easily. We have transaction, we have a log of everything that happens. We have reach querying via the data log, which is their querying language. But yeah, it's only but only in a spirit is it like a good good fit visits. Well, it's actually is a server side kind of a system. It's not open source. And yeah, then we get to data script, which exists in the same similar mind share. It's a closure script implementation of those ideas, of the atomic ideas. But again, it requires a JavaScript runtime. So not really something we can necessarily deploy for real in the environments we care about. And it exists in memory only and we really care about persistence of data. And so we get to build your own. And so like the, which is what the rest of this kind of talk covers. And the basic concept is for now anyway, is to use something like SQLite or specifically SQLite underneath to store the data. You get everything from SQLite. It's a solid project. It's a reliable relational store with FTS capable small memory footprint, et cetera, et cetera, as long history behind it. And then we'll start layering kind of the ideas we care about on top of SQLite. So we'll layer transaction log on top. We'll have a meetable strongly type schema and we have querying it. And that implies that we'll need to kind of compile our query language or like data log in this case into SQL in the end. And so one of the first kind of prototypes and stops at this is was written in closure script. Because at the time we're prototyping a browser written in JavaScript and this, like this was a good prototyping decision. We ran into a bunch of problems, buggy async channels, et cetera. Like the transpiling process was slow and buggy. And it's just, and again, like this required a JavaScript runtime, which like is a hard requirement if you're trying to embed this across and use this across like, oh yes, Android applications and also in desktop, et cetera. And so we, yeah, so like Rust is a natural next kind of a choice. Like the only real alternative to this is C++. But like we get a lot of benefits from Rust, right? Like it's a modern expressive language with really nice algebraic data types that really kind of help manage complexity or something like this. And we get predictable, correct results once we sort of work through all of the problems, right? And we, it's performant, right? So like it's not all kind of roses. It's the implementation took much longer to write than something done, the initial prototyping closure script. And it's still kind of an ongoing thing. But we, like we get a lot of things out of it, right? In the trifecta that Mentat cares, the Rust trifecta Mentat cares about this kind of correctness combined with portability, so cross-platform combined with performance. So those sort of things that really make a difference here. So and if we kind of peek under the hood of Mentat, it's essentially, so like the core is the transaction log. And this isn't quite what, like if you were to open up a Mentat SQLI database, this isn't, this isn't an exact representation, but it's close enough. It's a quad, well, it's a topple store. We have, everything is an entity in the system, including your schema definitions. And entities are described by attributes, which, and then like the two have a value, right? So like a URL, so like a login URL will be an attribute of some entity, and there'll be a value of like rustconf.com. And those, everything is grouped by transactions because we want atomic applications of data. We kind of want to track when things happen and how. And since it's a log, and it's an append-only kind of a log, we need to know if we're adding or retracting a piece of data. So changing a URL will actually involve retracting the existing URL and then appending a new one. And so that's what this looks like under the hood. And so the thing that's missing here is that, you know, the values can be typed. And we have a bunch of different types we support and, you know, we can enforce that at schema level. And then Mentat will complain if you're trying to write wrong types into data. Yeah. And so, and let's try to, and like this is an example of what's the transaction table, how data flows around, right? What it actually looks like for real. So for example, like it's also simplistic, but it's, I think it illustrates the problem well enough. So we have the same citations before. We have client one, two, a blob. The server is still like a DOM server blob. Everything is still unto an encrypted. And except now we have those transaction logs. And so let's try to start making changes, right? So like changing a password in client two, we're missing some records, but this is good enough, is essentially evolves appending a new password to the log. And you'll notice that it's, now it's in a different transaction. And let's make a similar change in client one, we'll append a new URL. And there you go. Now we're syncing stuff. We append like a server essentially has like an encrypted log, which two into which we appended new records. And let's, when we sync those records back, we now have like, you'll notice that we have conflicting IDs. So well, it's almost an implementational detail since we chose integer, you know, like transactions are represented as an integer. But like we performed something akin to a git rebase, where you play back, well, you rewind your current state back to a shared parent, you play in remote changes. And then you play your local changes on top. And like that's one way to do this. You could also represent this as a merge or depending on your needs or like the schema might different things might make sense. But like if you were just to do a simple rebase, well, you end up with this nice linear log where you have like an exact record of everything that happened in the system. You have the old password, the new passwords, the URLs, everything. And then we sync back the log back to the first client. There you go. And so now both clients, they agree on the log. They have a coherent view into the system. And they have a nicely, like they have a nice historic view of everything that happened. And so like your conflict resolution algorithms, like specific to the domain, so the data models can make the right decisions, which is nice. So that's kind of the flow. And then we get back to kind of querying this data. So like we talked about CQRS and how it's important to kind of think about that separation of how you restore data and how you modify the data and then how you read it back. And then the server are quite different. Like if you were to read the transaction log all the way from the beginning, it will be a terrible experience performance-wise since you have to actually... So we do that for you. We call that the datum stable. And it's essentially a materialized view into the transaction log. It's a snapshot of the transaction log as it is, as of the last transaction. And so you notice it's missing the added field. Well, it's not. It's just a snapshot. This is what it is now. So it's... And Mentat internally keeps modifying this materialization as the data changes. And yeah. And then so you'll notice that... So this is what the datum table will look like. And you'll notice that the attribute is now expanded. Like it's actually... Attributes are also part of the schema. And they're just another entity in the system. You have entities, discovering entities. It's kind of like an RDF approach. And yeah. And then when we're querying this stuff, we expand out from the keywords like login URL into an actual entity. And the datum stable will contain that. But when you're querying this data, you don't have to worry about stuff like this. You describe the attributes you'd like. And this is a simple example of a Datalog query that you would use to query this stuff. You describe what you would like and you define your bindings. Like URL, username, password will be bound to login slash URL, et cetera. And you get back. And then internally Mentat will compile this down into a bunch of SQL. And you'll notice it's self-joins on a datum stable. So we're not querying the transaction log. We're querying this internal materialized view of it. It's a bit quicker. It's not... Like it's still pretty slow or slow enough. But it's... Yeah. But that's how it would sort of look like internally. So you go from this into this. And you'll notice that we're not doing anything really tricky here. Like we've resolved the attributes into their entities and we're joining on NTID. So to ensure we get the correct view of the data for each entity. Yeah. And then you get back bunch of results. And this is missing the NTID column. But you get back data from the system. And then you do this at infinite. The Datum Stable is just as simple. It's an internal materialized view. It's one materialized view in the system. But the CQRS lesson is that applications are usually well served by many, many different views that define... That represent the data that the UIs or whatever else would like to query in your data stores. You have a really normalized data store. And you have denormalized views into that data store. And so we have... Or we will have, anyway, is a user-defined materialized views. Where you define... Say you might... If you're in a browsing situation, you might only care about the top 10 visited websites. And your history might have tens of thousands of records. But you only care about the 10 of them. And you only want your view. And when you're querying your view, it will be really quick. It will just have 10 records. It may be like 5, 10 columns or something. So the querying will be really quick. And internally, Mentat will use transaction listeners to monitor changes to its log and update the views you've defined, which is nice. And it's the same mechanism that it uses internally for its datums table, for its datums view. And then you can define as many of those views as you'd like, as your application requires. And there will be very domain-specific for you. And that's your way to query data. But when you're writing data, you're not touching your views, obviously. You're still writing into the transaction log. So that's sort of like a peek into Mentat's internals. And it was really important that whatever we build will work in every environment we need. And those environments involve many different mobile platforms, well, two mobile platforms. That's one too many. And as well as Firefox desktop itself and maybe other consumers in the future. And so we have a public API which is abrupt in an FFI layer, which lets you access it from, well, from non-rust code. And we build an IS SDK in Swift and an Android SDK in Java. They're shallow SDKs. They map pretty well to Mentat's internals. And they'll essentially let you use this stuff without having to worry about FFI and any of this stuff. Just as if you're writing like a nice native Java application, you can query and write data and you can define materialized use and things. And you can, you know, invoke syncing and integrate with whatever is appropriate to your environment for scheduling stuff like this. And soon we'll have SDKs in Kotlin because that's like the hot new thing, apparently. And XP-COM stuff for well integration into Firefox proper. Because the plan is to, well, so right now we're still solving some of the hard problems. So syncing is work in progress. We're still working on the various merging algorithms. We're working on forgetting data, which is important from both privacy concerns, like the right to forget and just the ability for people to erase history from their log that they don't like, as well as just from engineering perspective. You have an infinite log and I really find that universe and it's just not going to work forever. And yeah, we're still evolving our API a little bit as we're building our internal consumers that's shaping the API and writing the commutation. And the next step is shipping this in Firefox logbox, which is a standalone password manager that Mozilla is building that gives you access to all of your Firefox account passwords. And so it will be backed by Mentat and eventually syncing via Mentat as well. And in the future, we'll have this internally process of building Android components that help us build browsers better and will be wrapping Mentat into an Android component. So like it's more easily accessible for our internal teams and for anyone who's well, building something that looks like a browser. We'll be using it in new products that Mozilla is building to explore storing new data and interacting with the existing sync ecosystem. We'll be building new features inside the existing products, which was historically quite difficult because evolving data is really difficult in the existing system. It will be much simpler in this new world. And eventually we'll start evolving the existing Firefox stores and building, we have a vague transition plan, and it's sort of coming to fruition with some of the efforts internally. And the next is, well, I guess you guys, like perhaps other people have Mentat-shaped problems, then we'd love to help you address them and get you involved in the development of this stuff. So the links are, everything is in standard places, although the development happens at GitHub. The commutation is there. There's a wiki page, has a bunch of model link examples, and a lot of kind of theory on syncing and data stores written down. So please peruse that and yeah, and then find myself or other people from the teams to ask additional questions. Thanks a lot.