 Hi folks, welcome back to another Rust stream. This time it's gonna be probably a longer stream and we're gonna tackle more of like an implementation problem as opposed to the crust of Rust style teaching. There are a bunch of these on my channel too, so go look at some of the other ones if you're curious. Unlike many of the previous implementation streams, we're not gonna port anything this time. We're just gonna write a Rust thing from scratch. And in particular, I don't wanna say problem we're gonna tackle, but the domain we're gonna be working in is how cargo talks to registries. So the most well-known registry is crates.io, but cargo does support alternative registries as well. And we're gonna talk a little bit about how when you run cargo publish on the command line, what happens and what's the code path that takes the source you have locally and sends it to crates.io or to another registry. What happens at that registry? And when someone runs cargo build and has a dependency on your thing, how does it get fetched from the registry? And in particular, the reason we're gonna talk about all this is we're gonna implement basically that loop. And when I say implement, it's not so much that we're gonna implement the low level logic of how to issue the HTTP requests, but rather I want to look at the data structures that are involved along these steps and the conversion between them. The reason I wanna do this is because currently that the definitions of those data structures and the conversion between them lives in a bunch of different crates and a bunch of different just arguably not even crates just different repositories that don't really interact with each other. So they don't share that logic, they don't share the definitions. And that both means that there's a potential for them having mismatches between them, but it also means that they don't get to take advantage of knowledge of how the other parts of the system work. So as an example, on crates.io, when a crate is published, there's metadata the cargo sends along with that publish that includes things like the name of the crate and the version of the crate. That's also contained in the file that cargo uploads. And at the moment crates.io doesn't actually check that these two are the same. And that could end up with just weird things getting in the index or it doesn't basic sanity checking, but it would be nice if crates.io could just do these checks. One of the reasons why it would be nice is imagine that over time cargo starts sending more information along with the cargo publish, then you may wanna backfill for all things that were uploaded before cargo started including that information. And in that case crates.io basically wants to rerun the cargo logic, but currently that logic is just entirely contained within cargo and it's not in a place where crates.io can really get at it and rerun it. And so we're gonna take a look at the components involved and try to see if we can construct one crate that can be used by cargo, that can be used by crates.io and ideally they can be used by other registries as well that may want to essentially implement an index themselves. Okay, so let's see where we start. We're gonna start by talking about the cargo side of things. So when you run cargo publish, what happens? Well, really what happens is two things. First, cargo runs cargo package and then it uploads cargo package to a particular end point at crates.io. And we can look at this if we look at the cargo book that is the wrong part of the cargo book. We want to look at publishing on crates.io. And I don't wanna look at the user guide. I wanna look at registry. Registry web API publish. So we'll talk about publish in a second. Let's talk about cargo package first because it runs first. So when you run cargo package, what happens is effectively, cargo just takes your entire source directory and your cargo tumble. When I say source directory, I don't mean SRC. I mean the entirety of the thing that contains your cargo tumble and the files next to it. More concretely, it's everything that's in your cargo tumble include directive, not things that are in your exclude and by default everything next to cargo tumble except things that are in your get ignore. The default rules are weird but basically it creates a tar ball. So it's a compressed archive, a zip of all those files and renames that into a dot crate file. So in fact, you can find these on your computer. So if we check out dot cargo slash registry slash cache, actually knows dot cargo registry, source, GitHub, that's the extracted ones. So I want cache, GitHub, let's LS that. Okay, so at this path, and we'll talk a little bit about what this means in a second. This is where cargo will download in the dot crate files for any dependencies that you take. So these are hosted on crates.io and they are the result of the cargo package that was run when cargo publish was run for the appropriate version. So let's look at some random one of these. Like let's look at the zip 7.0 crate. Okay, so if we run file on this, it's just gonna tell us this is Gzip compressed data and it's a dot crate file, but really it is a dot tar.gc file. And we can look at this. Like if we run tar tzf of that file, you see it tells us these are the files that are inside of that archive. And the files here are not terribly surprising. You know, there's the files from source, files from benches, the cargo tommel, the gitignore, license and readme. This is a CI file that just isn't excluded and therefore is included. There are two files that are weird here. There's cargo tommel dot org, which we'll talk about in a second. And there's the cargo VCS info. And if we do this, let's see if I actually remember tar commands by hand. It's a little unclear. Uh, x. Ooh. What's the one for printing it to standard out? I think it's just O dash. I think I can just do dash O dash. Dash O dash. Great. Oh, maybe I don't even need the dash. I can just do this. Sweet. So this prints out just the contents of that file within that archive. And you see the stuff that's in here is just the shawam, the git hash of the commit that in this case, this is my crate. So the commit that I was in when I ran cargo publish that published this version of zip. All right, this is essentially metadata about the context in which the publish happened. There's no guarantee that like my directory wasn't dirty or that I even pushed this commit anywhere or that this is even accurate, but you know, it's there. And if we look at cargo.toml.orig, this is the cargo.toml that was present when I ran cargo publish for this package version. No modifications, hence the file name.orig here. It is the original cargo.toml. And that gives us to what is this other file that's cargo.toml but not the original? And the answer to that is, and you'll see this at the top of any search file, that this file is automatically generated by cargo. It's essentially a normalized cargo.toml that removes the use of a bunch of features that are available to cargo more broadly in part so that the thing that you publish only uses a much smaller set of features for cargo. So it's more likely to compatible with older versions of cargo for people who want to download it. And it also has some other modifications, like it removes workspaces. It makes sure that there are no path dependencies in there, no patch statements. So essentially it's sort of a cleaned up version of the cargo.toml. This might be a thing that we want to stick into the, this crate that we build that essentially goes from a cargo package directory to a .crate file. I'm not sure though, whether that's a part we want to include here. We might want to say that our thing is limited to going from a .crate to the registry. Okay, so that's all the stuff that happens when you're on cargo package. It really just generates this file, which it modifies the cargo.toml slightly. It generates this VCS file, and then it tars it all up and renames it to .crate. And you can test this yourself. Like if you're in any given crate directory, cargo package directory, and you run cargo package, it'll do this for you and it'll tell you where it placed your .crate file, which is gonna be target slash package slash name dash version .crate. And you can look at it and see what's inside. It can be interesting. Sometimes for example, you might look inside and discover that, oh, there's a bunch of files in here that I've just forgotten to exclude that are only used for CI or something. And you can actually make your crate file much smaller, which is gonna make publish faster and it's gonna make users on the other end happy as well. Okay, so I said the cargo publish is really equal to cargo package plus some kind of curl effectively. Like something that sends it to the actual registry that you're trying to publish to. And that's when we get back to this publish endpoint that registries have to implement. The endpoint is the crates.io slash and then this or whatever your registry might be, you do a put, you include some tokens that say that you're allowed, essentially authenticates you to crates.io. And it stipulates that the server should validate the crate, make it available for download and add it to the index. And we'll talk about the index in a second. Now, the body of the data sent by cargo is an integer of the length of the JSON data, metadata of the package as a JSON object, then an integer of the length of the crate file and then the crate file. So like the body of the request is essentially a manually encoded multi-part HTTP message saying the JSON is this long, then the JSON, the crate file is this long, and then the crate file, all as part of the body. And the crate file is sort of well-defined, right? It is just a tar ball. This metadata, though, is a type, it's a structure. And in fact, it's outlined below here what includes. The following is a common example of the JSON object. And so you'll see there's the name, there's the version, there's an array of direct dependencies and various information about those dependencies, features, authors, and a couple of other fields that are also included. And if you squint at this, you sort of notice that this is really just the same stuff that's in the cargo toggle in the crate file, which makes you wonder, why is it also included here? And this gets to part of the reason I want to make this crate because much of this is redundant. It's totally possible to generate this JSON solely from a crate file, and that's one of the things that we're gonna implement. My guess is it was done this way so that the remote server doesn't have to also extract the whole crate file and parse cargo Tommel and all of that stuff. And instead they can just parse the JSON which has all the info. But the crux of this gets back to the server should validate the crate, right? Because realistically what that means is you can't trust that the JSON matches what's in the crate file. Certainly not for the name of the version. So you have to do at least some parsing of the cargo Tommel, at which point maybe you should do the whole thing. And that's one of the things that we're making to do easier here. In fact, I don't think there's anything in here, at least at a first glance, you can't get from the cargo Tommel. And then this information about the responses, we're not gonna care about the response. Our job here is just to be able to take a dot crate file and produce this JSON. And also to have the definition of this JSON. Now, there is a crate that already provides this. I apologize for the brightness. I'll save you twice in a row here. So there's a crate called crates.io or crates-io, I guess, which has just the definitions of each of these API types. Now, this crate, as far as I can tell, isn't actually used by crates.io. It is intended for those who want to look like crates.io or want to talk to crates.io. Why it's not used by crates.io? I'm not entirely sure. It might be because it has a bunch of other stuff in here, like curl and URL, because it also implements the talking to part, which obviously crates.io doesn't need, which is why I want the crate that we build to really be standalone. It's not intended to be doing network stuff for you. It is just the definitions of the data formats and the conversions between them. Okay, so the thing you're gonna see here, in particular, is the new crate type, which has all of these fields that we just looked at in the JSON here. So this type is one that we're gonna end up replicating in ArcCrate. We could even re-export this one, all the fields are pubs, so there's no reason for us to split the ecosystem unnecessarily here, but I would rather do it the other way around where that crate takes the dependency on ArcCrate because I don't want ArcCrate to take a dependency on this crate and then bring in all of this, like, curl stuff, for example. Like, that doesn't, that feels like unnecessarily expanding the dependency graph. I'm hoping that ArcCrate will only really take a dependency on CERTI and maybe nothing else. But we'll see, maybe this error, I haven't decided yet. We'll see how that pans out. Okay, so you're in cargo publish, which does a cargo package, which we talked about, and then it runs a sort of curl put, which is putting this JSON to the server. What does the server do in response? Well, that's entirely up to the implementation of the registry. On crates.io, it goes into a database. And in addition to going into a database, it also goes into a git index. So the crates.io index, which is to say the thing that cargo actually talks to to discover which versions exist of which crates is at the moment a git repository. And you can look at it on GitHub. It's this git repository right here. So whenever you run cargo update or something like it, and you see updating, what does this say, fetching crates.io or updating the index or something, like the thing that gives you the progress bar that's annoying to wait for and sometimes takes a really long time. What it's really doing is it's doing a git pull from this git index. And if we look at the commit history, you'll see that there's just endless commits of updating crates. Like this thing, this basically means someone ran, cargo publish of the pub crawl crate version 0.1.0. And as a result that caused crates.io to trigger a commit into this repository. And if we look at it, the thing it actually does is it updates a file at a path that looks like this. Well, we'll talk a little bit about the syntax of these paths. And the context of this file is it's multiple lines. So each version is one line in this file. So if there were multiple versions, in fact, we can, let's go look at a file right now. Instead of looking at pub crawl, let's go look at something like, oh, let's look at zip again. Zi. So the syntax here is for most, for any crate whose name is four letters or longer, it is the first directory is the first two letters. Second directory is the second two letters. And then the file name is the full name of the crate. For anything that's three letters, it's three slash and then the name of the crate. So actually, let me go into this one first. Zi. So you see here, there's Zi. And then under there, there's gonna be a PF. And under there, we find zip and also zips for some reason. But like, you'll also see here, if we try to go to slash three, these are all of the crates that have only three letters in them. And this one has subdirectories for the first letter of the crate. So we're going to hear these are all the three letter crates to start with A. For the ones that are two letters, it's just a flat directory of all of those crates. And same thing for one, that's just all of the ones that have one. And you'll see all the letters are taken. So like, this is gonna be the crate called Zi. So that then gets us back to, well, what's actually in these files? So inside of the index files is one line per version and each line is a JSON object. And if you sort of squint at this, you'll say, see that this looks an awful lot like the JSON that you're supposed to send to publish. Name, version, depths, which is that same thing we saw over here, depths. And now it's not exactly the same because for example, over time, the exact syntax for what cargo sends up to the registry has changed. Some fields have become optional. Some fields have been added. So it's not exactly the same, but it's sort of the same. And certainly going from what cargo sent, which is all this stuff, to what's in the index should be fairly straightforward. There's one field though that's here, but that's not in the publish, which is you'll notice here, there's no field called checksum. There's name, there's version, there's dependencies, features and a bunch of other metadata, but there's no checksum. But in here, there's a field called checksum, which is a hash of the dot crate file. And so going from just the publish JSON to this isn't possible without also having the dot crate file. Hence, it should be possible for us to build the whole conversion, assuming we have the original dot crate file. Okay, so this should immediately erase some questions like, okay, what if the format of this changes over time? Don't these files get really large? Don't we get a lot of commits in this? And we're not gonna dig too much into that in part because with HTTP based sparse registries, there's a blog post about this on the Rust blog. People won't be using the Git index all that much anymore. The HTTP index though has roughly the same structure. So it's basically like index.crate.io slash and then these paths. In fact, we should be able to try this. It's index.crate.io slash this. Yeah, so now we got exactly that same index file, but we didn't have to do any Git checkouts. And this is what cargo is transitioning to using. Currently, you can opt into it or I think you can opt into it on, you'll be able to opt into it on 168 stable. The default hasn't changed yet, but it will at some point. And so there'll be no more of this like update and index resolving Delta's business. The other thing to be aware of with this Git registry is it actually gets squashed every so often. So there are way more than 23,000 versions on crates.io but it gets squashed whenever the history gets particularly long to avoid keeping that the sort of resolving Delta step so long. But in any case, the index is primarily responsible for hosting this list of versions and the dot crate files that have checksums that match the entries that are in here. And so when cargo goes to talk to a registry, what it actually mainly does is if in your cargo Toml, you've declared I have a dependency on zip, then cargo will talk to the registry either by Git cloning it or by sending an HTTP request, look at this path, parse that file, like parse each line of that file, look for, basically construct the list of versions that are available, run the resolver to figure out which version among these should I choose based on the dependency declaration in your cargo Toml. And then it's gonna download the relevant crate file and then it's gonna do the build. Now, one thing that's worth noting is that you might wonder why are the dependency lists in here because when I download a dependency, it's cargo Toml that tells me what the dependencies are. The advantage of having the dependencies listed in the index is that you can do a full resolve of your dependencies by only talking to the index and not downloading or extracting any crate files which makes it a lot faster. So for example here, when cargo sees, oh, you have a dependency on zip, it looks at the index, let's say it picks, you know, this version of zip, then it looks at the depths, it sees that, oh, it has a dependency on RAND, let me go fetch the index entry for RAND and resolve this version requirement. And then it keeps doing that until it's resolved your entire dependency tree. And then it goes and fetches all the talk crate files and then it can do the build at the end. Okay, so that is the whole path. Now, these definitions right here are also represented in a crate in the ecosystem called crates index. Now crates index, sort of similar to crates IO is not just the data definitions. It's actually, it knows how to talk to the crates IO index in particular. It'll do things like it knows about the cargo home directory and knows how to look there. It knows about how to clone the git index and then it'll do lookups into it. So it has a lot more features than just the definitions. But the thing that we're looking for here is crate. So a crate is a sort of abstract concept. You can, it maps directly to one of the files in the index and it doesn't have any information in of itself except it has a list of versions, right? Which is that a parsed representation of that long list and every version here, also here the fields aren't public which is also interesting. But you can see that the sort of getters we have name, version, dependencies, checksum, features, links is yanked and download URL are all the things from the index with the exception of download URL which is what you can programmatically generate. If we look at the source here though, it uses a bunch of other things to reduce the size of the struct. And we'll talk about this in a second. But basically this is trying to parse out all the stuff that's in those index entries. And what we're gonna have basically this definition inside of our crate as well. Now, one thing that's worth talking about here is the fact that this crate does a lot of this, we'll look at cargo in a second too, it also does a lot of this of having special implementations or special types that it uses for some of these things to avoid the overhead of, for example, allocating a string for every field in every dependency of every version that it parses. So for example, in cargo, it uses this thing called an interned string. Here it's using small string for packing short strings directly into the pointer. We might need to have our library if we want it to be used by cargo, for example, it would have to be generic over the string type. Potentially more than one string type, but let's say just one string type for now. We'd have it to have it be generic over the string type so that cargo could choose to use its own optimized string types rather than being forced to use string the same way that we do. Okay, so now that we have a general idea of the sort of whole cycle or life cycle here from publish to consume, the next thing I want to do is dig in a little bit to the code on the cargo side and the code on the crates.io side to see where this stuff lives, to explore the code a little bit before we start writing our own code. But before I do that, let's do a quick like, are there questions about the life cycle as I've described it so far about the various interactions that we've seen or any of the data formats or even just what we're building? Let's do like a, just to make sure we're all on the same page. Isn't JSON not very suited for stream dependency resolution? Like you have to parse the whole JSON before even knowing the dependencies list. In practice, it doesn't really matter because the entries in the registry are all very short. It is true that like, you could have a fairly long list of dependencies, but realistically, it's only your direct dependencies that are listed here. So like, there aren't really projects that have like thousands of direct dependencies. And so these JSON lines just aren't very long. And if you stream deserialize, the amount of time you would save if you were able to start resolving the next one immediately would be basically none. And so having a format that's relatively easy to work with is probably worthwhile here. Will there be redundancies if something like zip has a dependency of RAND, but RAND is also in your Tommel. So this is generated by cargo for you. This is part of the JSON that it publishes. So if you in your cargo Tommel, you say, you know, let's say you are zip 700, in your cargo Tommel, you say RAND equals 0.8. Great. So that is in your cargo Tommel. Cargo then packages that up into a crate file and then it puts that to the crates.io web API. And in doing so, it also includes the JSON, which says there's a dependency on RAND 0.8. And it gets that information from your cargo Tommel. So it's redundant, but it's also the same. Like cargo will derive one from the other. And then when crates.io receives this and creates this index entry, it takes the stuff that's in the JSON it got from cargo and then sticks that into the index again here. Why build another crate instead of removing or feature grading curl and such from crates.io or from the crates.io crate? It's a good question. Part of it is because I want to experiment with this and it's easier to experiment on a thing that I build myself. Part of it is because I want more than just what crates.io gives, right? So I want it to, I want to not just have the publish side. I also want the index side and I want the dot crate side and I want the conversions between them. So it's a little bit outside of the scope of the crates.io crate and it's a little bit outside the scope of the crates.index crate. This is sort of a thing that holds all the things in between. What I would hope to see actually is that the crates.index crate and the crates.io crate take a dependency on this crate that we build for the definitions and then they build, you know, convenience wrappers for accessing those things on top of that. Is there any limit on dependency length? Like A depends on B depends on C depends on and so that's dependency depth. There's no limit on that. I don't think there's a dependency on the limit on the dependency length either as in the number of direct dependencies any given crate can have. Neither of them I don't think have limits. It's just in practice, no one has a long list of dependencies. And by long, I mean like thousands. Why might hosting the index on GitHub be decided? So the reason why the index is hosted on GitHub was mostly because it's really straightforward, right? You just have a Git index that you commit to whenever there's a new version and makes it really easy to go back and look at older versions of the index. You have a record, you have like an audit record of every change to the index. Looking at the deltas of the index between two different point in time is pretty easy. So like, and you know, it means that checking out the index locally is just to get clone and you can get, you know, efficient delta updates by doing a Git pull. So it has a lot of attractive properties and I think it made a lot of sense when crates.io was much smaller. I think now it's getting to the point and this is one of the reasons why sparse registries was sort of developed in the first place was it's getting to the point where the Git index is becoming or the index being Git is becoming a problem. And you know, the more scalable solution here is to use an HTTP based API, which is what sparse does. If the index is a Git repo that they periodically squash, is there a need or mechanism to clean the local clone? You shouldn't need to, cargo will do this for you. Cargo manages its own clone of the index. And so this gets back to one of the things I said I was gonna get back to earlier, which is this part. Nope. So this path right here, this is the sort of canonical path for the Git index. The hash here is basically a hash of the URL of the crates.io index. So it's not gonna change. And if you take, if you use an alternate registry that's also a Git registry, they'll end up with a different hash here. So that's how cargo differentiates them. And you can like, you can inspect this. And that's not the one I want. So cache holds the .crate files, source holds the extracted versions of every crate file. And then index holds the actual index itself. And you see here, if we run Git status, or if we run LS files, huh, am I confused? Oh, index, right.cash, right. So Git, C, this, LS files. Oh, right. This is a bare checkout of the repository. I think if we do log, oh, weird. Oh, it's because, okay, so the reason why we can't do this is because it's not a standard checkout. They do some things to try to just fetch the head commit rather than fetch the whole history because otherwise it would take very long. So there are a bunch of caveats to this, but this is effectively a checkout of the Git repository upstream. And cargo manages it and it cleans it. It tries to avoid checking out all the commits, all the history, that sort of stuff. And this subdirectory here, .cash, has the actual entries from the index. So you can see here, you know, Zi, PF, zip. And if we bat that file, you see it's JSON. Okay, so let's look at some code. Let's start out by looking at cargo. So we're gonna end up looking a bunch at the cargo code base here. But the majority of cargo's stuff lives in the source directory. And then there's the crates subdirectory, which holds sort of utility crates, which includes the crates.io crate. So that is actually one thing that's sort of effectively owned by the cargo team already. And I think it is also what cargo internally uses for interacting with crates.io is just not what crates.io uses in order to define its own API. So realistically, if the crate we build here ends up actually being useful to these teams, it might end up that it ends up being adopted into cargo and then the either crates.io goes away or it wraps the crate that we're about to build. So this is where the crates.io crate comes from and where these definitions are from. So we'll end up copying a bunch of the stuff from here. If we go back then to source. So what we're really after here is the logic around publish and where those definitions live. Well, the definitions live in there but where the code for publish lives. So source bin cargo has the definitions for all of the various commands. So if we look at publish, for example, I don't want this symbols thing. I don't want this thing either. You see, this is just the definition of the publish sub command of cargo. You see it uses clap, it's pretty straightforward. It's a slightly modified version where they have their own helpers for a lot of these flags or shared among many of the commands. Now, when you look at the exec and you'll see this for every one of the sub commands, they have a similar kind of structure. They have a CLI sub command that, or a CLI function that defines the command. This is sort of a command builder and that has exec, which is the actual entry point for executing that command. And you'll see that it doesn't actually do very much. It mostly just arranges arguments and then calls into the ops module and publish. So the ops module lives inside of source cargo, ops. And this is where the definitions of all of these commands actually live. The reason for this split is sometimes a little weird, but it's mainly for reusability. It means that you can call these methods from potentially multiple binaries. So you can have multiple binaries that share logic for some of the underlying stuff of the CLI command. And it also lets you cleanly separate out the things that have to do with the command line interface, like the actual argument parsing and stuff versus the stuff that this actual logic and that might be usable by say other crates or other cargo commands, like the external sub commands where people take a library depends on cargo. So we saw back in, back in here, it calls ops publish. So if we go to source cargo, ops, mod, where does publish come from? Publish comes from registry. So we go back here, stick in registry and we'll see a publish. So this is the definition of publish. This is the code that executes when you run a cargo publish after parsing all the registry, all the arguments and stuff. And you'll see it mainly finds the sort of parses your cargo config in your workspace manifest. It looks over the members, it looks for the active members of the package you're currently in. So if you're in a workspace, it tries to find the sub or the crate in that workspace that you're currently in, because that's the one it assumes you wanna publish. And then it checks which registry you actually wanted to publish to. So in this case, it defaults a crates registry, but you can say in your cargo Tomo like publish equals and then the name of a registry to say this should be published there rather than to crates.io. And then you see it does, it constructs a registry for that publishing and then it calls package one. So this is where it sort of delegates to cargo package to generate the crate file. And what we can look at package one in a second, what that gives you in result is a tar ball and that is the dot crate file that we talked about. And then if it's a dry run, then it just doesn't really do anything else. Like if it's a dry run, it's sort of done at that point, but otherwise it constructs, it gets the Shah of that dot crate file that it just generated. It creates an operation that it's going to send to the crates.io registry. And then transmit here is the thing that actually uploads to crates.io. So this is the thing that ends up sending both the generate to JSON, sends to JSON and then sends the dot crate file as the payload to the remote registry endpoint. And then this bit at the end is the messages you may have started to see if you run cargo publish, which is the like waiting for the crate to become available. So this is when you run publish, there's a bunch of logic that has to happen on the crates.io side. And part of that is just like it has to do a get commit, it has to like send it to the database, it has to, I don't know, store the crate file to, you know, it's backing store like S3 or something. And at the end of all of that, your version is actually in the index and available to other people. And this loop is just trying to make the command not terminate until it's actually available to other people. So that if you're on the phone with someone, you're like, I ran cargo publish, they're not gonna say, I can't use it yet. Like it telling me that the version doesn't exist, this time out is gonna, or this loop to check that it's available is gonna save you from that. You'll see there's a verified dependencies here, which does things like check that you're not trying to publish a crate that has get dependencies, for example, that's not permitted. And transmit is this implementation of sending the payload to crates.io, which you see here, it computes the list of dependencies, well, computes the list of dependencies, generates this new crate dependency type, which is part of the new crate JSON payload that describes each of the dependencies. It parses out the manifest, parses out the readme, looks at the license file, and then ultimately constructs one of these new crate things, which has all of that information that it just extracted from the manifest, and then somewhere down to the bottom here, bah, bah, bah, bah, bah, bah, yeah, yeah, yeah, so it calls registry.publish, and this is from the crates-io crate, which has a publish method that uses curl to actually send the payload. So a bunch of this stuff, like generating these, this like intermediate stuff is our things that we could do in our crate. There's no need for that to be part of cargo itself, because it's a standardized process of going from, I have a package to, I want the payload. And then on the receiving side, for crates.io, you'll see there's a bunch of code here too. It's also written in Rust mostly, except for the front end, of course. Prime is salty about me not following him on Twitter. Oh, well, too bad, I'm sorry. I'm very, it's interesting actually for Twitter, at least certainly in the early days, I was very cautious about who I follow, simply because otherwise I can't read my timeline. So I tend to actually, I don't really do this anymore, but I used to actually read every tweet on my timeline, which only scales if you follow a small number of people. I'm sorry, I'm sorry, Primogen. Very sad that your rate didn't really work. I only recently set up the 10 minute follow chat block because we kept getting just spammers come in and it was really annoying, but it does get sad if people want to read. Okay, so let's look at the crates.io side of things, like where they received this JSON payload. So from memory, this is source controllers, crate publish. So this is the thing that handles puts to crates new, used by cargo publish to publish a new crate. And you'll see it takes the request, it splits the body using this fact that it knows the length of the JSON and the length of the crate. And so this is the parts of the request that are the JSON and the parts of the request that are the tar ball. And then you see it decodes to JSON bytes using this encodable crate upload. No, I don't want that encodable crate upload. Okay, fine, I'll use it definition over here. So that's defined somewhere else. And you see, this is just the same definition that's in the crates.io crate, just slightly different types for different things. Like for example, you see they have, they have new types around a bunch of these, not entirely sure why, it might be so that they have more type safety actually, so that they ensure that they don't accidentally pass in, you know, the name of a dependency instead of the name of the crate. Right, so this gives you compile time guarantees that you didn't pass the wrong field in the wrong place. Whether this is something we wanna adopt in our own crate, we can discuss. I mean, I think it is valuable and you can always choose to not use it pretty easily. Yeah, so that's an interesting, I kind of wanna keep this file open as well because we're gonna want to refer to that later. But you see it parses out the JSON and then it, checks for any missing metadata, which is disallowed. You must have a description, you must have a license in order to upload, connects to the database, constructs a new entry in the database for based on the information that's in the JSON metadata. It starts a transaction to actually store all the information in the database itself, checks that you have the rights to publish, check that the name is actually one that you're allowed to publish to, checks rate limiting and whatnot and reads out the tar ball, finds the checksum, uploads the S3 thing. Yeah, so this is all like the standard stuff that you would expect S3 to do. And at the end here, you see it registers this crate in our local Git repo. So this is the thing that actually generates the Git commit that eventually ends up in the crates.io index. And you see it has basically the same fields that we've already talked about, the name, the version, the checksum, et cetera. But this then is using a different crate again for the definition of what goes in the index. Let's go and see if we can find that. So this is one of the reasons again that why I wanted to build this is because all of these crates, all these parts of the ecosystem have different definitions for the same thing. So this is the crate type inside cargo registry index. Cargo registry index is a subcrate of crates.io and in Libs.rs, it has a definition of crate, which is the stuff that ends up going in the index. So here's yet another definition of that. The Primigence crew are very sad that they had to wait 10 minutes to actually enforce the raid. What you should have done is raid and no one say anything for 10 minutes and then all of you chat at once. It's because then it would have been, it would be a double raid. One is lots of people joined, the second is lots of people started chatting. Okay, so now we've explored the space of things that all have this logic, these data definitions. So let's pause there again and see, is everyone following, are there questions about the stuff we've discovered so far or about the plan going next? How's the support for using custom registries other than crates.io? I've never seen it used unlike other ecosystems like MPM. I mean, they're totally supported. One of the things that's tricky is you're not generally allowed to have cross registry dependencies. So you can't publish something crates.io that has a dependency on something that's in a different registry, I think. It's also a little annoying to implement a registry in part because of this requirement that your index is Git. Most of the companies that provide like registry implementations, they tend to have some centralized backing infrastructure for how they implement all of their registry support, whether that's MPM or PiPi or cargo. But cargo's registry is Git, which means you would have to like on the fly generate a Git repository based on the stuff that's in your database. And this is very expensive and really cumbersome. And so I think some of them are probably holding out for the sparse registry stuff where it becomes a lot easier to implement your own registry based on the infrastructure you already have in a way that's scalable and manageable. The Primitives crew's attention span is only four minutes. So therefore they couldn't pull off the 10 minute delayed read. Yeah, so there are a couple of alternate registry implementers. I think, oh, I forget the name of them now. Yeah, so someone mentioned GitT. There's one that starts with a, I wanna say anchor, but it's not anchor. There are a couple of them, but realistically they all sort of struggle because they're forced into using a Git registry like this. And I mean, create.io is as well. So it's not like they're really at a disadvantage. It's just really cumbersome. The other thing that's annoying actually with registries today is there's not great support for authentication. There's a very basic setup, but it only really authenticates publishes and not things like reads, which if you're running a private registry, you wanna control who's able to access your registry in the first place, because you might be uploading commercial things to it. So you don't want anyone to be able to clone and cargo doesn't have great authentication support for private registries, which is one of the things that's currently being worked on is an authentication mechanism for cargo requests that are for reads as well as other operations. So that's some of the reason too. So there's like a bunch of hairiness here, but a lot of it is being worked on right now. I recommend that if you're curious about the stuff, like go join the cargo Zulip and see if you can help out. I think there's a bunch of open issues too for things that might help speed this along. Some of it is also just like test this out on your own. Like for sparse registries, for example, having people test it out in their config, see whether things generally work for them is very useful feedback. Artifactory, Muse, Alexandra, yeah. So there are a couple of alternate registries that people have implemented. Okay, great. Let's actually write some code now. Cargo new lib, we're gonna call this lib. It's sort of like cargo registry. One of the reasons I don't wanna use cargo registry is because I really wanna, probably in a different stream, implement a cargo sub command called cargo registry, but I don't wanna reserve that name here. It is arguably cargo index. Cargo index types, cargo index. So it doesn't let you implement a cargo index, right? That's one of the reasons I don't wanna call it just cargo index. It has the types for interacting with a cargo registry. It has the types that are in the index, cargo registry types. It's not really that either because a registry also has to support other endpoints like yanking. They also have to have the HTTP response types encoded. It's not cargo types because there's a bunch of things in cargo, like types for things like a workspace, which this is not gonna hold. Cargo schema isn't bad, although this isn't the schema for all of cargo because that would include things like manifests, which we're not gonna encode. But maybe cargo index schema or registry schema, index schema. I'm torn between index and registry because the index doesn't know anything about the published JSON really, right? That's an intermediate data format that never makes it into the index. But registry is a little bit too broad, but maybe it really is interface. Maybe it is cargo index interface. It is true that cargo dash is a naming convention for cargo sub commands, but at the same time, it is the appropriate name here, right? Because it's not crates.io index interface because this applies to any cargo registry. Cargo space, that's funny. I mean, we could also call it interface for cargo indexes, but that's an awful name for a crate. Ifki, interface for cargo indexes. I think it has to be cargo index interface, cargo index schema, cargo index fray types, that's funny. DEFs, it could be cargo index types actually, but it also has the conversions. Naming is hard. Cargy is pretty funny for cargo index interface. Cargo index stuff. I do like cargo index stuff. Handoff is not bad. Let me, Merriam-Webster, come help me. I want, I'm sorry, this is bright. I want the thesaurus entry for handoff. Why does it not have a thesaurus entry for handoff? It's like transit, transit, cargo index transit. Yeah, cargo index transit. Because it's a transit point. It's all the transits you need in order to interact with the cargo index. Relay is not bad either, but I like transit. So it's gonna, also transit and cargo are somewhat related, so I like it. Cargo UPS. Okay, so what do we have here? Well, let's start out by saying this is gonna depend on CERTI. Okay, let me use the, let's be fancy here. See this part, this part, I'm so excited for this to go away. In fact, can we just, here's what I'm gonna do. Override set beta. In fact, I thought I already had beta. And let's do the, bill, no. Blog. And I think it's on the intern, the inside Rust blog, new index protocol, add this. So normally you're supposed to add this to your home cargo config, but instead what I'm gonna do is just add it to the local directory config. And the reason for that is it's not on stable yet. So if I added it to my system-wide config, I'd start to get build failures in a bunch of packages because stable doesn't have the feature. Whereas if I added locally to this, I'll get it for the local one. So .cargo config, I'm gonna have to make their .cargo first. Stick this in here. And now if I do cargo add, it's already a feature. Boom, update and create CIO index took no time because it didn't have to do all the get stuff. Nice. I guess that's one satisfied user report. So let's see what it actually added to my cargo tumble. It added the current version with features derived. Beautiful. Okay. Let's go back here. So I don't want these. What I want here is actually to split out the different phases, if you will, right? So the phases here are, there's the sort of .create. There is the publish. There is the, and then there's the index. And let's go ahead and do this. And this, and this. And let's start with publish. So inside of publish, we have two primary definitions, right? In fact, we have, yeah, we have two primary definitions. There is the one in crates.io, which is all of this bit. Do, do, do, do, do, do, do, do, do, do, do, do, do. So let's go ahead and tidy this up a little bit. These things we obviously don't have. And this is all gonna complain about things. I don't want dref, because that's not a thing that we have. We are gonna have to take a dependency on semver, which is the crate that has the implement. Ooh, actually I want features, sturdy for that, which is a crate that has the definition of semantic versioning versions, which appear in a bunch of these places. So we have in the index, no, in publish, I mean, what else do we not have? It's another dref dependency kind. Where does dependency kind come from? It comes from models and models, dependency kind and keyword. So here is a keyword, which I think they used under a different name. They used to just create keyword. And this is probably a chrono thing, which is, yep, chrono native time. Whether we actually use chrono here remains to be seen. And then what else we got? We wanted the thing from here, dependency kind. Just one of these. All right, what else is missing? Encodable crate version wreck. Serialize, serialize. Probably also deserialize for all of these because we're gonna have to do both ways. The reason I'm copying all of these in is because I want to have all the source definitions in one file. So I'm gonna do the same for the ones that are in cargo. Interesting. Looks like they have manual deserialize implementations here maybe as a crate here. So these are helpers for validating that features have the appropriate name and stuff. We gotta figure out what we do about those, whether we also do validation in here. The reason they do validation on deserialize here is almost certainly because that means that the type that you end up with, you know that if you have that type, it already conforms with the rules for that string. If you just have a string, you don't know whether it, for example, doesn't have spaces in it for a crate name as an example. Whereas if you in your deserialize, and I'm guessing they already checked this encodable dependency name, valid dependency name probably checks for, among other things, whether there's a space in the name which is disallowed. And so it won't even deserialize if it doesn't meet those criteria. So I think we'll probably wanna do this too. Now what's interesting here is that cargo might not want this. So one option is for us to be generic over the types for basically all of the string encodings here. So that cargo can choose to just use intern string, crates.io can choose to use their encodable versions. And we don't implement the logic for sanity checking that these values are the right way. Unclear actually, because one thing that's sad, right, is we're gonna end up with like name S or name T, verse T, feature T, right? Because each of these are a different type. So it's gonna be a lot of generics, which is a little sad. But let's not fix all the compilers here at the moment yet. What, why is there, why does it claim there are two implementations to serialize? Oh, there we go. Encodable create version cannot be dereferenced. That's cause this has to be self.zero instead. What about two types, raw create upload and validated create upload with a from or into instance? So I want to avoid getting trapped in a position where we try to do too much in this create and therefore no one ends up using it because it does too much. I want this to be a sort of foundational create that the other two can build on top of. And so encoding too much stuff in here, I think is probably the wrong way to go. I'd rather go the other way of saying this has only the core bits and then additional stuff can be built on top of it with any more additional logic that they might want. Okay, so that's the stuff from crates.io and then we'll want to also pull the stuff from cargo which lives in cargo crates, crates.io, librs, new create and new create dependency. Right, so I guess we can do this and say from crates.io and this is from cargo. And at least in theory, these should be equivalent. That's the hope, right? What's interesting here is like in cargo, for example, a dependency type is new create dependency and the kind of a new create dependency is just a string. Whereas in crates.io, dependency kind is encoded as an enum of normal builder dev. And it's interesting, we sort of have to decide whether we want to go the cargo way of saying because we're only constructing this, we're never consuming this, we're just going to have a totally general type here because it's always going to be controlled by us so there's no need for additional validation on it. Whereas on the crates.io side, they're consumers of it so who knows what garbage JSON they're going to be sent. They don't know that it's going to come from a real cargo so they have to verify everything. So where we fall on this is going to be a little interesting and whether this is an enum or a string is going to be one example of that kind of choice that we're going to have to make. Okay, so that's all the stuff in publish. Let's also bring in the stuff from index while we're at it. So that in cargo lives in, right? I'll keep that open for now. So the entries in an index in cargo live over in core registry. I want to say it's summary, which is inner. So the question is where is summary new called from? So summary new, you see takes like a package ID, dependencies, features and links, which is basically the stuff that is already in the index. But that means by the time summary new is called, it has already been parsed. So let's go ahead and look for summary new. See where that might be called. Resolver version prefs. This seems promising. Utilities, Tommel mod. Where is this coming from? Process dependencies. This is one long function. What does this parse? Two real manifests. That seems different. That seems like manifest conversion, which this might even be the thing that takes a cargo Tommel and turns it into the cargo Tommel that ends up in your crate file. So it takes the origin, creates the non-orig. Registry index.rs, this seems promising. JSON parse registry package. Please be extremely careful with returning errors from this, okay? So registry package seems like a thing that we care about. A single line in the index representing a single version of a package. Nice. So this seems like the definition that we want. So this again is from cargo. And I guess we can grab like these things from over here. Now here you see it uses intern string, which is the type the cargo uses to essentially de-duplicate allocations of strings. So intern string is more or less, when you create an intern string, it first basically checks this giant hash map of has this exact string already been allocated on the heap somewhere? If so, let me return you just a pointer to that instead. It's essentially a globally reference counted string mechanism where the only way you garbage collect is the program terminates. So what's version here? And I guess it's probably Semver. Yeah, Semver version. So this is Semver version. Now intern string, we're gonna have to figure out what we do about, but let's just make those be string for now because they're more or less equivalent. And again, this indicates that we're gonna have to be generic over these things, at least for if we want cargo to be able to use these definitions as well, they're gonna want to use intern strings and we wanna give them that ability. Registry dependency. So where is registry dependency defined? That's down here. That has string as well. Cow is just from the standard library. This we make a string. This we make a string. It's interesting here that for some of these, they are using cows with a reference to the input. And this gets at another use case that we might wanna cater to, which is if you're decoding JSON, then very often you can deserialize in such a way that the deserialized copy just references the input text. So you don't actually have to allocate a new string, you can just have a stir reference to the original input instead, which is what these end up being, which saves you from a bunch of allocations. So that might be something we wanna capture. And then let's look at from crates.io, what do we get there? Oh, someone asked me about this. This is like the new GitHub search stuff, which is sometimes nice but it also, it hijacks things like your control left, which makes me really sad. But in general, it's a little nice, like search is much better with this, but I think it's still beta. I wanna say it's still beta. All right, so now we wanna find where are the definitions in crates.io used to serve the index? So we've seen the code for this already actually because it was in public. Publish, create, publish, right? We saw somewhere down here the add crate in source worker git, index add create job. So just in two value inner. So this is like a job thing. So this is presumably they have a job that regularly does get commits and whatnot. But I really want to find what handles something over here. Perform index add create. Where's that defined? Let's define over here. The definition here is just entirely search based as far as I can tell. Yeah, it says right here too, that it just searches for, like it knows that this is one word and when you click it, it shows you all the results for searching for that word in isolation and it tries to guess which ones are definitions and which ones are uses because I don't think they have like semantic support for Rust at the moment. Okay, so this locks the index, computes which index file to use for the name. So this is the part that constructs the, the like, you know, slash zi slash pf slash zip. That's gonna be that function. But what does it write out? Jason's really two writer create. Okay, so just serializes this create type and then it writes a new line at the end. Okay, so what we really want is this create type which comes from cargo registry index. So it's really just this bit. So this is the stuff that goes in there and dependencies probably defined just below. It's interesting. They've implemented ordering for dependencies. I wonder why that's implemented. Oh, interesting. All right, we'll grab those two. Seems reasonable. This is standard compare ordering. Dependency kind is probably the same dependency kind that we already got from elsewhere. I would hope where they have another definition of dependency kind. Yep. There's a second dependency kind definition right here. Great. And this one's not the same one that's used elsewhere in the crates.io code base. Fantastic. Okay, so now we have that definition. And I guess this isn't used. Deserializer isn't used. Serializer isn't used. And this isn't used because we don't have any manual implementations. All right. And then I guess the last part is .crate files. And so here we have to be a little not careful but we have to figure out what matters to us because we could take a dependency on cargo. That's one option here. We take a dependency on cargo and use cargo to, so we extract the .crate file. We use cargo to parse the Tommel that's inside of it in order to generate the published JSON. The downside of doing that is it means that cargo can't take a dependency on us because it would be a circular dependency, right? So using cargo for this is probably not what we want but at the same time the cargo Tommel manifest format is entirely defined in cargo. But we only need the definition of the simpler dependency, the simpler cargo Tommel manifest which is the one that appears in the .crate file. Remember how we have the original which is like a full-fledged cargo Tommel and then we have the generated cargo Tommel file which is supposedly a simplified version. So the question then becomes where is that defined? That's gonna be something like, well we saw the two real manifest. Actually let's go to, let's go look at the cargo, let's go look at cargo ops package. Cause that has to have a call somewhere that does that rewrite and I'm wondering whether there's a type definition just for the simplified manifest? Build a R list. So this is the thing that decides what goes in the archive. So it loops over all the source files and for cargo Tommel, it pushes into the archive the original, like under the path original manifest file which is probably cargo.tommel.eridge, the contents of the old one and the generated file manifest. Generated file manifest, okay. So somewhere up here we have this. And so we wanna know when it prints out file contents of a generated manifest, what does that look like? That's done here. Okay, so this is when we're looping over the things that are supposed to go into the .crate file. If it's the file is on disk we just write out the file that's on disk. If it's generated file and it's a manifest then use package.toRegistryTommel. Okay, what's to registry Tommel? It's defined in source cargo core package to registry Tommel. It takes the manifest of the workspace, the original manifest and calls prepare for publish. All right, and then it calls Tommel toStringPretty. Okay, what does prepare for publish do? Prepare for publish returns a Tommel manifest. Okay, so Tommel manifest seems like the type we want here. Now the question is, is Tommel manifest the whole cargo manifest definition or is it somewhat simplified? If I find Tommel manifest here, it's gonna be a giant type, isn't it? Tommel manifest, Tommel manifest. You can't see what I typed. It's just, it's not very interesting. Oh, no. Yeah, so it's just here in the symbols type Tommel manifest. And now it won't take me to it. Take me to it. All right, manifest. There we go. This type is used to deserialize cargo Tommel files. Yeah, that's what I was worried about. So what that means is there isn't a separate type for just the simplified manifest that cargo will write out. Because in that manifest, for example, there is no workspace definition. Prepare for publish. Because this only writes out some of them as the thing. So we could probably come up with the definition here that is only the bits that are actually generated. Like you see here, this prepare for publish method produces a Tommel manifest, but there are a bunch of fields that we know are always none. Like project is always none. Dev dependencies too are always none. Build dependencies too are always none. Replace is none, patch is none, workspace is none. Badges, honestly, we could probably skip because they're effectively deprecated now anyway. Cargo features. Yeah, I'm torn here about what we do because like we could copy this whole thing and then strip out the things that we know are gonna be none. But like some of these are non-trivial like Tommel lib target, for example. Oh, it's just type def to be Tommel target. So maybe these aren't too bad. And for things like maybe workspace dependency, for those for us are just Tommel dependencies. They're never gonna be a workspace dependency. So that might make things easier, right? So if we go back here, self.dependencies as reps, but it calls map depths. And map depths to find down here will call filter all, what does all do? Trying to see whether this actually filters out things that are workspace dependencies. Filter, maybe workspace defined. Yeah, so the defined type is used when it's not a part of the workspace. So the workspace type is never used, right? So what this map depths thing is doing is removing anything that is a workspace dependency. Those are filtered out. So that means we don't have to encode those. Okay, so let's try to see if we can just construct a simplified version of Tommel manifest that only includes the bits that are actually, they're actually setable from in the, the sort of normalized manifest. Okay, so we're gonna need a couple of other definitions here. We're gonna need Tommel package. We're gonna need Tommel profiles. Actually, we might not need profiles but we're gonna need Tommel package. Is Tommel package always set? Package is set, project is not. Okay, so we need Tommel package. Ooh, this feels like it's gonna be painful. Now, there's another option here which is instead of actually semantically parsing this, just using, just parsing it as like generic Tommel. So all we get back is like a giant bee tree. Like it's just basically like similar to a sturdy JSON value and then we just fish out the things that we want but it is nice to have it be structured. There won't be a definition of this in crates.io unless they also parse the, we could check whether they also parse the .crate file but I have a feeling they don't really. Uh, so let's see what this verify tar ball thing is. Okay, they decode the tar archive. They see that the contents of the archive are reasonable but they only check the path. It doesn't look like they actually check the cargo.tommel. I mean, it could be they do it somewhere else but I don't immediately see anywhere else where they, actually we could of course search for cargo.tommel. See what we find. Let's render read me, create sidebar, create publish, line 376. Aha, okay, they do. Because this is add dependencies. This is walk all the dependencies and check that those dependencies are available in the registry. So this is different. This is just checking that you're only taking dependencies that are from the registry. Create metadata, what's in here? Encodeable crate, this also seems separate. It might actually not look at the cargo.tommel at all but it is like the check that it does down, not here. Where was I just now? The thing that we just looked at, this one, where it checks that the, the subdirectory path, that the only subdirectory path you have is the name of the crate that you just uploaded in its version, basically ensures that that is the crate that's being used because cargo, when it downloads the dot crate file, that's the only path it will look at. So if some other file is there, it wouldn't work. But it never actually parses the cargo.tommel it looks like, which is presumably because they don't want to have to encode all of this stuff in their crate. All right, so what's maybe workspace field? That's maybe workspace. What's maybe workspace? That's a certain untagged that's either defined or not, which means it doesn't matter to us. So anything that says maybe workspace field, we can just replace with the inner type, string or bool, string or veck, I love those, veck, string or bool, that's fantastic. Right, we are gonna need the tommel crate here. This is gonna be a cargo index transit, cargo add, tommel, it's not, it's fine. Okay, so that's tommel package. What else do we have here? Tommel lib targets, let's go back here to see what actually gets set. So package matters, project does not. Dev dependencies two does not. Build dependencies to, oops, did I delete the wrong thing? Yes, I did. Whoa, sturdy renamed dev dependencies. That's fun. Dev dependencies two does not get set. Build dependencies two does not get set. Replace does not get set. Patch and workspace do not get set. Okay, so let's go back here to the manifest definition. Did not work for some reason. So I want the, these type defs and we're gonna need tommel target. I think we're also gonna need tommel dependency, right? So maybe workspace dependency, I think is really just the workspace dependency. This one is really just the type definition of tommel dependency. So anything that says maybe workspace dependency is really a tommel dependency. And tommel dependency is defined over here. And that of course has its own special deserialized because of course it does port phantom data that seems fine. Detailed tommel dependency, we're also gonna have to grab, which is fun. Okay, so we have this. So this is presumably why no one's done this before is because it's to some extent really annoying. And also like it's unfortunate to have to replicate all of this data. But it is nice to have a sort of simple representation of what's actually, what actually ever goes into the cargo tommel that's in the crate. So that's detailed tommel dependency, but we know that for all of the detailed tommel dependencies, we call map depths, which calls map dependency. And it maps anything that's detailed and removes path. We know there's no path in there. It removes git, it removes branch, it removes tag and rev. It changes the registry index, which is fine. It leaves everything else alone and simple dependencies are turned into detailed dependencies. Okay, so that's interesting. So there are no, there are no simple dependencies. There are only detailed dependencies. Okay, that's nice. So this is actually tommel dependency. That's nice. There are no intern strings. Maybe workspace field is always just the first thing. Parameter P is never used. Great, so P goes away. Tommel platform we're gonna need. So where does Tommel platform come from? Tommel platform. Okay, so we have Tommel platform, but Tommel platform is also remapped. I think I saw. Where is the platform? Wait, which field even is platform? Target, because target here gets mapped to Tommel platform, where dev dependencies to and build dependencies to are empty. So if we grab this in here, build dependencies to, dev dependencies to is nothing. Maybe workspace dependency, we already set our Tommel dependencies. This can just use rename all to get rid of these. Okay, and then the things that remain are these string or veck, version trim whitespace, that's fun. That's a pain. Okay, so there's string or bool, string or veck. Okay, so these are just like sturdy helpers that I guess we'll have to bring in because we don't really have a choice, but we can bring them into a sub module, so we can do a mod deser. And we're gonna do this. We're gonna go to its definition, create it for me please, thank you. And grab in the sturdy stuff, and I guess use standard format and see this should all now compile. It does not kind of find maybe workspace field. Well, it's not gonna be maybe workspace field. It's just gonna be one of these. Ah, and version trim whitespace is not gonna have to handle visit map because it's never gonna be a Tommel workspace field. So it's gonna be a string. So it's just gonna be parsed. And now it's complaining about, right, that's fine. Okay, we're slowly but surely getting there and cleaning this up. Tommel profiles is still missing. Tommel target is still missing. I'm gonna go here and do use deser star. It's gonna help a little bit. Tommel target we still need. Back doesn't work because why would it? Tommel target, of course, is also non-trivial because it has to be. What do we use of Tommel targets? Because they also get mapped somewhere here. 1456. Ba, ba, ba. Read me, where's the target bit? Right. So targets get mapped. That's curious actually. Where is Tommel target used? Right, this is a different kind of target to this kind of target. Basically, target is a platform and these are targets as in, I want to build this binary as opposed to target as in target platform because that would be complicated. So this is not a target, this is one of these and those are cloned as is, which means all the fields matter. Path value, all right. What's path value? What, so it's just a string? Oh, it holds a path buff, but it's serialized and deserialized as a string. Interesting. Okay, that's fine, I guess. I guess it's not just deserialization either because we just brought in a serializer and this is gonna be standard path path buff and debug is fine. So now we have path value. These are all, of course, annoying. What's really sad about this, right, is that we're replicating a lot of the definitions without making changes. Like, I was hoping that this would be sufficiently trimmed down that we wouldn't have to, I was hoping we could trim it down sufficiently that the fact that there was some overlap didn't matter because the subset that we're gonna use is so small. So I'm hesitant to continue down this path because, I mean, I guess we're almost done, but I don't know if I want to keep all these types in here. Arguably, what we should do is look at how many of these things actually make it into the published definition, for example. Because for instance, this doesn't have any information about things like targets. So we don't actually care about extracting that from the Tommel Manifest. It doesn't have anything about profiles. Neither does this stuff in the index, I think, if we look at it. It has name, version dependencies. Check some features. Features two, yanked, links, and v. So I think actually, before we continue, we're gonna prune out all the things that don't go in the upload. Which includes all of the targets and the profile. Cargo features I'm unsure. And the fact that the targets went away means that this goes away. And that Tommel platform goes away. Tommel dependency is still there, but we pruned that one down a little bit already. Tommel package we still need because that defines the actual cargo package like the name and the version. Oh, it's private. What do you mean it's private? Did I not? Oh, this has to be. What I really want is pub super. Because I don't actually want these to be. Oh, this does need to be pub because it's a type. Which I guess means path value also needs to be pub. Although it's path value even here anymore, it is not so we can get rid of that. That makes me happy. And in fact, even within Tommel package, if we look again at what goes to publish, addition isn't here. Authors is build is not, at least at the moment, right? It's neither in crate nor is it in V. So actually let's just look at the ones that are actually complicated. Badges is the main one. So build is not there. So it can go away. Metabild. What on earth is metabild? Metabild. Do I even want to know what metabild is? Like if I look at a manifest, metabild. There's no metabild field. Seal references, package. What is metabild? Where does this come from? This is very strange. I don't think we're going to need metabild, whatever it is. What else goes in there? Published does not go in there. And it's not in the index. Which means that we don't need to keep it. The auto stuff does not go in there. I'm a teapot does not go in there. Workspace does not go in there. Exclude and include probably don't go in there either. Read me, I think does go in publish. Yeah, read me file. So that one we do need to keep. Metadata. Doesn't look like metadata is actually published, which is a little interesting. Does metadata end up in the index? I doubt it, because it's like free form. Okay, so in that case, we don't even have to parse metadata. This also means that we no longer have a bunch of these. We have StringerVec from artifact, which it's a good question. Does that end up anywhere? So if we look at the index entries and we look at what goes in dependencies, it has target, it has kind, it has package, but it does not have artifact. That doesn't go in the index. Does that go in publish? Does not, it only has target. Although it's unclear what. So all of these things are, of course, name different things. So here, when it says target, what is that supposed to be? I guess it's the platform name. It's probably target here. Okay, so in that case, artifact isn't even a thing that we parse out. Neither is lib, because remember, this is only the stuff that, the stuff that goes in the index, so the stuff that the registry cares about are only things that are related to ownership, display of metadata and the resolver, which is ultimately what goes in the index. And so things like this produces a Cdilib is not important to the resolver or to the registry, and that's why they're not in here. So even though they are in the cargo tumble, we don't care about them, which means that because it's not in the index or the registry, lib isn't necessary. Public doesn't appear to go anywhere here. So public, so public, I think, so this field is probably something that is gonna start to be relevant to the resolver, because it's basically a flag that marks whether a given dependency is is allowed to be used in any publicly-facing types. So actually no, it's not relevant to the resolver. The thing this is gonna try to help with, let me see if I can dig up actually the cargo private dependencies. Not pre-built dependencies. Okay, let's see if Google can search better. Public private dependencies. This is the RFC I was after. I'll put it in chat too. So this RFC is proposing the ability to say that I have a dependency and I don't want it to leak. I don't want this dependency to be exposed to my consumers. The reason this matters is for backwards compatibility. Imagine that you take a dependency on foo 0.1. And in your API, you, for example, return a type from foo. That means that if you upgrade to foo 0.2, that's a breaking change for your crate because it means the type signature of one of your return values changed from foo 0.1 bar to foo 0.2 bar. Those are different types. And so being able to mark a dependency as private is gonna tell cargo when you build, if you discover that this dependency is visible through any of my APIs, then fail the build. But at the moment at least, that's not supported and also it's not gonna affect resolution as far as I'm aware. So you can go away. Great stuff. And I think default features two, we also know isn't used. If we go back to the part that maps dependencies back here somewhere, we know that in Toml dependency, in map dependency, no, in map dependencies, map depths, over here. Not Toml platform, config. Oh, it dependencies reads self dot dependencies. Oh, this is default features. So map depths calls map dependency. I'm surprised actually that this doesn't rewrite the features. That feels like a missing feature in cargo. So to give a little bit of context here, the reason you see all of these default, name of field and name of field two is because in older versions of cargo, the default features field was encoded as default underscore features for unintentional reasons, which means that there are some index files, there are some manifests that have default underscore features for dependency, or at least it was accepted in the past, which means that we have to continue to accept it. And so the thing that might be in a, oh, actually this makes me think that we might have to support it elsewhere. Anyway, it means that we end up parsing out two fields, one by the name of default dash features and one by the name of default underscore features. And then we essentially have to combine them or I think what it actually does is read the one with a dash and if it doesn't exist, look for the one with an underscore. Now, where this gets a little dangerous is it could be that this gets past the dot crate file that is from eons ago. And so it has the old syntax. It actually has the underscore and I do think we might have to handle that. So even though cargo, when it generates its sort of normalized output, you see it never generates a two, it always generates the one, the one with a dash and not the one with the underscore. Old versions of cargo didn't have that normalization. So they might still generate one with an underscore, which means our library might receive ones with an underscore, which means we have to support them. So that is all just to say, the places where we see a, let's see where is the Tommel manifest definition. We actually do need to handle build dependencies to and dev dependencies to. And we can go back and do this replacement more. So that's fine. So now suddenly our Tommel manifest is much more manageable because it only has the bits that we actually care about. And I'm gonna go ahead and make the claim that we should remove badges. Badges has been deprecated by crates.io. It was generally considered a bad idea. So I'm just gonna say we're always gonna generate an empty badges list. And if someone wants to get mad at me, they can. And so for publish, I guess we'll still send badges, so it'll just be empty. All right. So I think now our dot create Tommel manifest parser is complete. I think there's still some of these that we don't use that we can trim. Like for example, this stuff, the stuff that goes to publish, none of that talks about default target or force target for the package. I'm guessing index doesn't either because this is more about builds than it is about resolution. So these can all go away. Exclude and include can go away because they're only used by cargo package. So once cargo package is run, there's no need for exclude and include anymore. Default run also isn't relevant to the registry. So that's gonna go away. All this metadata though, I think probably is. So we'll keep that for now. And now the question is, what do we parse out? We still have string or bool for the read me, but that's all we have. So if we go back up here, we don't need string or VEC. All right, I guess we can just do this and see if anything fails, which nothing does. Great. String or bool, we do have. VEC string or bool, we do not. And version trim whitespace we do still use. We don't have any manual implementations anymore. We don't have path buff anymore. So this file is now much simpler. Okay, we're almost there with getting all the types in. Chrono is no longer here. This is no longer used. So I realized there's one more source we have for index definitions, which is the crates-index crate, which we looked at earlier, which is this crate. So this crate, if we go look at it's gonna be bright, I apologize. If we look at its dependence, it's used by like cargo edit, cargo deny, cargo release. It's used by anything that wants to parse stuff that's in the index, cargo vet, cargo public API, you know, a bunch of these things. But crucially, it's not used by crates.io or by cargo. But it is used. So this is another instance of version that we're gonna have to deal with. When we looked at this a little bit earlier, it's definition is over here, and hopefully, right. So they have their own string type that they use. This is gonna be a VEC of dependency. You know, and they have other optimizations where they choose to use an arc for the feature so that this is cheap to clone, for example. We're gonna have to grab dependency out of here, which comes down here. This again is string, string. Box, box, string. Double indirection to remove size from the struct since the features are rarely set. Right, so this is a VEC string. It seems like for this crate, they've done a lot of work to try to ensure that the encodings or the in-memory representation of a single version of a crate or as small as possible, which does make a decent amount of sense because the crates index crate is used often to process crates in batch, right? So this means you're doing, like, imagine you wanna walk every version of every crate. Well, there are a lot of versions, so you really want all of them to be very compact, and so you do what you can to squeeze these as small as possible. That doesn't really matter for cargo because it's only gonna look at a number of versions equal to the size of your dependency resolution, which is decently small. And for crates.io, it's only looking at the one thing it's currently publishing so it doesn't need to be small, but here it does need to be small. Oh, there's another one. Oh, great, docs.rs also parses it. Of course it does. Okay, so they have their own encoding of dependency kind as well because of course, dependency kind is defined multiple times. Dependency is defined multiple times. That's all that makes sense. So we're actually gonna do mod crates index. This is mod crates.io and this is mod cargo. And we'll stick these in there and then string. This is gonna be btree map. Ah, so there's a dependency here on the hex crate, which I think we do also want. Exists but is inaccessible. All right, we'll see where that comes from at some point. So publish now, we'll do the same. We'll say this is crates.io, this is cargo and then in .crate it's all cargo. Okay, see we're on cargo check. We're gonna get a bunch of warnings. We're also gonna be some errors I think in the, right. So this is over in the publish stuff from crates.io because it has all of these custom implementations of deserialize that do validation, which includes things like using their various validator methods. I think realistically, we're gonna get rid of these and we're gonna replace this with the ability to choose your own type for each of these. And again, like cargo is gonna want this for things like intern string. I could imagine that the crates index crate is gonna use small string here and crates.io can use its encodable crates names. Okay, so someone said there's one more implementation of parsing metadata. So that's gonna be tub not that, rustlang docs.rs blob master source utils cargo metadata.rs. So what does this do? This is different. So the reason this is different is because this runs the cargo metadata command, which has its own format for output. But we're gonna ignore that for now because that's, I think what cargo metadata outputs is somewhat related to the toggle manifest, but they're not the same. So we're gonna ignore cargo metadata for now. Now, there is also a cargo metadata crate, which I think this one could use, but doesn't. And the cargo metadata crate has the parsing for all of the output from the cargo metadata command. So we look here at metadata. It has a package, package has a lot of the fields that we might be familiar with by now. And you could imagine that we implemented a conversion between probably be the dot crate stuff and one of these. But this has a bunch, like this includes things like default run, things that don't matter to the registry. This is essentially a print of the toggle manifest, but in a slightly normalized way. So that was a little different. And we're gonna mostly ignore it for now. Whoo, okay. So the next step now is gonna be to tidy up these so that we, why don't we need to parse the license? We do need the license. That's in publish and it's in dot crate. We said, I don't think I removed it. It's under package. So that's Toml package, Toml package license and license file. So they're in there. They're not in the index. The index doesn't care about licenses, but they are in the publish stuff. Okay. So next up now is gonna be to tidy up these definitions so that we only have one definition rather than multiple. I realize now that I actually should have not been as dumb and should have kept the information about which of the string types these different libraries actually care about optimizing. Because when we were copying out these definitions, some of them like name was intern string, but some of them like homepage was not. They were always string. And I should have kept that information here and just done like intern string is string. So let me do that real quick because that's gonna be nice for us in a second, which is features. I guess let's walk this from the top. Cargo features is a heck of string. String, string, string, string, string, string. Features is intern string to a heck of intern string. Hate this new search so much. I don't want it to go somewhere. I just want it to highlight. Oh well. Tomo profile we don't have maybe workspace, Tomo workspace dependency, Tomo package. So the name in a Tomo package is an intern string. Addition is not. Like that's the only intern string there. Inheritable fields we don't care about because we don't have workspaces. Tomo dependency, detail Tomo dependency. Interestingly enough, it doesn't have intern string for the version. Kind of interesting. But the, oh the map doesn't either. Yeah, it seems very arbitrary which ones are actually intern string here. But so there weren't too many. Great, so we did those. And for publish, I don't think it actually used intern string there, but let's go look. New create uses string for everything. There's no intern string here. Okay, on the crates.io side, we already kept the special types there, so that's easy. Then for index, for cargo, I think we ended up doing some erasure here. So in particular, registry package, the name is intern string. So this is gonna help inform us which of these need to be generic in the future basically. Intern string, features is an intern string. Features two is an intern string. Links is an intern string. And for registry dependency, the name is an intern string. Features is an intern string. And package is an intern string. The crates.io one, what did crates.io have for here? It just used string for all these because it only writes it once, so it doesn't really care to optimize it too much. Probably same here for crate and dependency. Those are just strings, great. And crates, registry index, that's the same. And over here, in crates.io, we're gonna do a type smallster is just gonna be equal to string. And we have in dependency, name is smallster, rec is smallster, features is boxboxstring, that's fine. This is package is, wait, am I looking at the wrong one? I am looking at the wrong one, okay. I'm looking at the wrong one. It should be this one. Type smallster string, this is smallster. I'm looking at dependencies. This is smallster, this is smallster. This is boxboxstring, that's fine. This is boxboxstring, this is option smallster. This is option smallster. I guess we can keep the box too to make it clear how they are encoding these. Box, box, smallster, and what do we got for the entry itself, that's up here. They have smallster, smallster. This is for them an arc. We're probably gonna clean these up. I'm just trying to make it so that we start with the right type definitions so that we can then optimize them later on. Okay, this is a box hash map of strings. This is an option box of smallster. That's interesting, that seems false. Claims that there's no implementation of serialize for arc of dependencies. I find that hard to believe. DCerialize is not implemented for arc hash map string vex string but it's using it right here. And this is just derived serialize DCerialize. So I call shenanigans. Let's look at the repository and see. So what's the cargoTOML here? Oh, features equals RC is apparently a thing we need. Great. And then I guess what we can do for publish here is instead of having all of these custom deserializes that do validation, we just delete those and retain the types. So delete, delete, delete, delete, delete. And then we say type this equal string. I can't type spaces apparently. This is a string. This is one of these. All right, let's do a macro here. I should have included an arrow down at the end of my macro, but I do love macros. Beautiful. Okay, so now, does it now build? It does. Lots of warnings, but that's fine. We don't worry about these. I guess we can get rid of some of these things now that we got rid of the custom implementations. Multiple fields are never used. That's fine. Whoo, okay. So we're now in a position where we have all of the type, all of the different type definitions from the different crates that are using this. And the next step is gonna be to create the definitions that we want to use and the conversions between them. And probably as part of that, we want testing, we want documentation and all that biz. So that's gonna be next steps. But first, I'm gonna have a bio break. I'll be back in a few seconds. All right, I'm back. I walked out of my office and immediately saw both cats sitting there cleaning their asses. Beautiful. Okay, let's see. Where do we wanna start? Do we wanna start at the dot create end or the publish or the index end? You can chat can decide while I eat something. I gotta vote for the crate end. So we'll start there. Part of the future difficulty if this common crate gets used upstream seems like it might hinder future optimizations if they find it's needed. I guess it can be a major simmer bump for adding a new field or a list of fields with a generic string type. It's a good point. I mean, there's an argument for we make every type here generic. I don't love that idea, but it is a path we could take, right? We could have a one generic for every field. And the only thing that we require is that they, we don't even really need to require anything. But we wanna indicate the type of the field. What do we want our own type shouldn't a generic hash map do? So the reason why these different crates have different definitions here for even things like the map or vectors, right? So the crates index crate, for example, uses box slice. Some one uses hash map, one uses B tree map. It's usually because they have different needs. So I think cargo, for example, tries to keep the entries ordered for anything that sticks in the registry. Or actually, I think cargo doesn't. So cargo uses a hash map for the published JSON but crates.io uses a B tree map for the receiving end because they wanna make sure that the crates index entries are sorted so that if they regenerate it, they get the same order of fields to avoid churn in the get diffs. So cargo wants the faster hash map whereas the crates.io team wants stable index entries and the crates index crate wants whatever is the most compact representation which is like a box hash map because a hash map has a bunch of fields. So indirecting it through a box means it's slower to access but the type is smaller. And so they're optimizing for different things. Now, dot crates should be the easiest because it's only used by cargo at the moment. I did find it interesting that cargo only uses the intern string for the features and for the name of the package. Not the names of dependencies or for the name of the package. I wonder if there's some rationale for that. Like if we go back to, this is the Toml stuff, right? So if I go to blame here and I say name colon and I also do this. So what do we get here? Serdy deserialize for calster allocates by default. Oh, interesting. Yeah, because my instinct here is that for many of these we might actually want a cow but that doesn't work for option. But it does work for option though. At least I think it does in newer versions of Serdy. If so, we should check that because if that's the only reason then I'm also curious like why name? Why just name? What was the other one features? What? Something I'm very confused here about. Okay, the search is very confused about how things worked right now. Interesting, so here they're not. So it was that previous commit. So it's this. But this is a giant implementation. I remember this PR. Like this is huge. But it doesn't really explain why the change to intern string happened and only happened for features. The reason I'm trying to figure that out is because I'm trying to figure out both does it matter? Should we use it in more places? You know, it's tempting to just say this which I think is supposed to just work but maybe I'm misremembering. This is certainly my instinct is to do this, right? The box here is interesting. I guess they don't want to inline the Tommel package because it's too large. And then we put sort of borrow unlike all of them. I'm gonna do the same here. So this takes tick A and we borrow every single one of them. We allow every one of them to borrow. And same for Tommel package. So at least in theory, the benefit of this, right? Is that if you have a manifest file you're not gonna allocate any strings because all of them are gonna be just references to the strings in the input. The one thing that's inconvenient for the definition like this apart from the fact that it has a lifetime is sometimes you actually want an owned copy. Like you deserialize something out of a, you know, some buffer that's temporary and now you wanna copy it around. So what you want is you want, like you have a thing that's, you know, Tommel manifest tick A and what you actually want is a Tommel manifest that's static that you can like send to another thread or something. So the way that I usually end up doing this when I have methods like this is, and I guess let's call this normalized manifest, is I have a input normalized manifest pubfn to owned, takes a self and returns a, so it takes a normalized manifest of any lifetime and it gives you a normal life manifest of a static lifetime. And the way that it does that is normalized manifest, our cargo features into iter, map, this is cow to owned, collect. So for each field you just sort of figure out how do I turn this into a thing that's static and you do that for every field. So now if someone does wanna make that conversion they have a convenient way to do so. Be a normalized and this can just be a package, just be dependency package. It doesn't need to be any fancier than that. Like so this is gonna be package to owned. The reason I use the name to owned here is because it's the same thing you have on cows. Is cargo features or it's an option VEC. So it's dot map. See if here's a VEC. So the map here is given a cowster and cow to owned, oh, into owned. Yes, into owned is what I mean. And then of course, you can also do to owned. So it's a reference to self gives you back an owned version of it by doing self.clone.intoowned. You can write a more efficient implementation of it but for now we're just gonna have into owned. Value of type VEC borrow cowster can it be built from an iterator of elements of type string? Yes it can by mapping this to cow owned. So the setup here is we map, this is an option so we map over it. So if it's none, it stays none. If it's some, then we iterate over it and this is a vector of potentially borrowed strings. We turn it into owned strings with cow into owned and then we map it back into a cow which is now gonna be of any lifetime because it's owned and then we collect it back into a vector which gets stuck into the option. This we haven't implemented yet. That's why we get that error. Dependencies is gonna be a B tree map. So that's gonna be something along the lines of self.dependencies.map D into iter.collect. We're gonna have to fill in the bits in the middle here. It's gonna be a map from of key value and it's gonna, we're gonna map it to cow owned of K.into owned and whatever the value is which is gonna be V.into owned which we haven't written yet. So that's for dependency. So I guess we can write that the signature here. All right, so we're gonna have dependency. We haven't written this yet and we'll have the same for package just to get rid of some of the errors that we're gonna see further up. So this is gonna be this and then map box new. And instead of trying to be all functional and fancy here we can just do star P like so and dev dependencies is gonna look exactly like this business except it's gonna be dev dependencies and dev dependencies two is gonna look exactly like this. Build dependencies is gonna look exactly like this and then features probably structured exactly the same except the inside is a deck of cows. We're gonna have to do a little bit more or less is the same. So F into iter except that the value here is gonna be into iter collect and it's gonna have the same structure as this one where we map it to the owned version of the string map it back into a cow and then collect. Oh joy, what a right method to write. Very repetitive. But so now we have a mechanism where we deserialize into a fairly compact representation. It's not intern string, right? So there is still the argument that if cargo wants to use intern string they don't have the option the way we currently structured it. But it looked like the argument for making these intern strings in the first place was that they're currently allocating on deserialize. But this should not allocate on deserialize because we're using 30 borrow. So that should be fine. I guess we need to write this. So version here is a little interesting to me because here it's represented as a string but we know that it is a semver version requirement. So I actually think that this isn't that it is a semver version wreck. The weird thing about doing it this way is that semver version wrecks are internally a vector of comparators, which means that this is gonna allocate on every deserialize which might be sad, but they already have a bunch of other strings in here so I think I'm not too concerned. I kind of just wanna keep this the way it is. So that means this is gonna be self. Here's another question. Is version here ever none for a dependency? I don't think it can be. So remember we looked at the dependencies that get generated here and for mapping the dependency list, symbols get mapped to some. I guess there's no, there's technically no check here that version isn't set but if you tried to declare a dependency on something where the version isn't set then it wouldn't have gotten published in the first place. So well, one of the things that we're running into here is the fact that the cargo definition for Tamil manifests is used for a bunch of different things, including just parsing the raw Tamil that the user has written. But the Tamil that the user has written can have all sorts of errors in it, like not specifying the version for a dependency. So we need to be able to parse it and then realize that as the case and give an error. But for anything that actually ends up in a published dot crate, it should never be empty. So I think we're gonna do this. And this is probably the case for other things too. Not for all of them, right? So for features, for example, it's totally valid to not specify features for dependency. Does it force them to be some? No, it just clones it at is. Yeah, that makes sense. Great. So now we can write this. So version is gonna be self.version.clone. Many of these, which are just really strings is self.registry, doesn't have to clone even. Registry map, cow into owned, cow owned. Index is gonna be the same. Features is gonna look like the one we had up here. This business optional is just self.optional. That's already static. Default features is gonna be self default features. This is gonna be the same except two package is gonna be like this is an option caster and target is gonna be the same. Great. And then for package, we're gonna have to do the same kind of thing here. So most of these are just option cowsters. So for a lot of them, in fact, in fact, here's what I'm gonna do. I'm gonna say that for all of these, 15 sub, good old regular expressions, we're gonna substitute that with map, cow into owned, map, cow owned, comma. And for some of them, that's not gonna be right. Right? So for this, for example, it's gonna be, it's just gonna be this. Right? I miss up a thing. Yeah, there we go. Authors is a vector. So it needs the slightly more advanced treatment up here, which is this thing. What else do we have? Readme is stringer bool. Keywords, so keywords is like authors. Categories is like keywords. And the stringer bool thing we're gonna have to figure out. Cause that should really be like caster or bool. Like really. Version is a self.version. Doesn't need any mapping. And then I guess this is just gonna be self.readme for now. Okay, even know if cargo features makes it in here. So the cargo features here, they're not features the way you normally think of features. They're cargo features of cargo that you enable for a particular cargo tumble. Like for example, you can say things like, oh, I wanna use like resolver version two. And I don't think that ends up anywhere. This has the schema for the index, but it doesn't have the cargo features. So this means that in dot grade, cargo features can go away. Package, doesn't need to stay. Although, I'm still not convinced we actually need the boxing anymore cause package is a lot smaller now. But it is an option. So like, although actually, I don't think it is an option because we're not gonna have work spaces here. So this is never gonna be an option. It's not gonna be an option because in the cargo to tumble that goes in a published crate, there is only one package. There's only, there's no workspace. There's a single package, which means we know there's a package there. So there's no longer an option, which means this goes away and this is just self.package. Dot into owned. I really wanna get rid of the dev dependencies to stuff. The tricky part here is that, I guess we could deal with it in a custom implementation of deserialize actually. Cause from memory, if we go back here, it really just treats build dependencies and build dependencies to as the same. And so we could do the same here and say, rather than have two fields, have a, just have a custom deserialization here. Now, the other way to do this, and I wonder if this would come back to bite us. I'm gonna go ahead and claim that it won't and do this. So certi alias is a way to say, well, that also caught a bug. That's supposed to be build dependencies. Did I, didn't I copy that from cargo? I think I copied that from cargo. Apparently not. Okay, my bad. So this is telling certi that this can also be deserialized under this name, which I think is what we want here. And there's always only gonna be one or the other. I guess the thing to check here is, this is gonna be bright, I think. Sorry. Is field attributes. Deserialize this field from the given name or from its rust name. Maybe repeat it to specify multiple possible names for the same field. And it's only for deserialize. So I think that's the behavior we want here rather than having all of these twos. So that gets rid of this and this. Then we can do the same over here, which is alias to that. And that way we don't need to deal with default features too here. I think that's the only two. In dot create files. A cargo.automal manifest from a dot create file. From or for a dot create file. So this is now looking pretty reasonable, right? Like the things that go in there ultimately are the definitions of the package, the list of dependencies, list of dev dependencies, list of build dependencies, and the features. And for the definition of a package, the definition of a dependency, it is the version. It is the registry and the URL of the registry. I don't believe that. The URL of the registry field. This is an internal implementation detail. When cargo creates a package it replaces registry, registry, registry index so the manifest contains the correct URL. So that raises the question of when this writes out the registry, what does it write? Because I definitely saw something about this changing the entry for the registry, but maybe I'm lying. Package package.workspace equals none. Package project. Something here is a lie. I'm pretty sure we saw something that rewrote the registry name to a registry index in here. Aha, map dependency. Yeah, registry specifications are elaborated to the index URL. So it removes the registry and sets registry index. So that means this field is not there or it's not relevant. The registry index is. The reason for this is because and the comment that I just removed said this too, that registry names are entirely user-defined. So you and I can both have the same registry URL configured as different names on our machines. So the name is sort of irrelevant when you publish all that matters is the canonical URL for the dependency or for the registry of the dependency. So registry index makes a lot of sense. Features, which are the features we enable of that dependency. Optionals, whether it's optional. Default features is whether or not we enable default features. Package. Packages for whether we rename this dependency when we bring it in. And I have a feeling that also gets removed. Actually, that's a good question. If I go to index, does it have package? I feel like it probably does not. Cause it would be kind of weird if it did. Depths is dependency. It does have package. Interesting. I'm surprised that it's even necessary for the renames to be known to the resolver. So when I say renames here, it's the fact that you can do something like non-five equals version equals, version equals five package equals nom. And what this means is it's gonna look for nom. The crate name is gonna be nom. And it's gonna look for version five. But internally in your crate, you can refer to it as nom five rather than nom. And this is part of the reason you wanna do this is so that you can do this kind of thing. And now inside of my package, I can do things like use nom seven, con call and foo. Oh, hey, spam. So that's the kind of thing that it's trying to do with, spam, spam, spam, spam, spam, spam. So that's what it's trying to record with this extra field. But I'm curious why this is even, why the resolver, why the index needs to know this. Because the index should only care about the package name. Because that's the actual dependency that it's taking. Maybe it's because otherwise it'll be annoying to deal with uniqueness. But that means package does stay in there. And target, why does target go in there? Target is in there because you can have target-specific dependencies. So in your cargo tumble, you can say something like target.x8664, unknown linuxnew.dependencies, nom equals seven. And what this will do is only bring in the nom dependency if you are building for this target. And so that needs to be known to the resolver because it needs to resolve your dependency closure and download your crate files and stuff, depending on your current target. And so it needs to know the fact that it's a target dependent dependency. All right, so I think everything that's left, oh, right, done for package. The addition, I don't think goes in here because it shouldn't have to. And it doesn't go in publish. So addition, we're gonna move away. The rest version doesn't currently but is going to. And so I'm gonna leave that in for now. Name of the package obviously goes in there, version obviously goes in there. Authors does go in there and should be under package metadata, really. Links does go in there, right here. It needs to be known to the resolver because you're only allowed to have one crate in the dependency closure, the links to any given shared library. So for package metadata, authors, so the authors and the description, basically anything that's free text, so authors, description, homepage, that kind of stuff, doesn't go to the index, but it does go to publish. Why does it go to publish? It's a little unclear. Like arguably this could just be extracted from the crate file instead of being read as part of the JSON payload that gets submitted with the crate file. But this is probably partially because crates.io doesn't wanna have to parse the cargo tumble, but maybe now that they can, it no longer needs to come with the JSON payload as well. Well, it's hard to remove after the fact. So authors, description, homepage, documentation, readme, keywords, categories, license, license file, repository, resolver. Well, so the resolver version, I don't think goes in the publish, and I'm fairly sure it doesn't go in the index either. So this is, there are multiple versions of the resolver. There's the cargo resolver version one and the cargo resolver version two. And they have, there's a blog post about it. There are like a bunch of, there's a bunch of changes to how cargo interprets your dependency graph, how it unifies version choices across say build and dev and normal dependencies. Now the question here is whether this version field in the indexed is the same as the version version of the resolver or whether it's just independently, like a versioning for the index line. This feels like it's not related to the resolver. Like reading the text, this seems to be more about the ability to make backwards incompatible changes to the index format itself. So I don't think the resolver version makes it in here. The other thing that's interesting is it looks like the JSON metadata doesn't include information about the index entry version, which must mean that, okay, so then the question becomes where does the V come from? So this would be in crates.io when it generates the index entry, V, so that's just from crate. So when it creates the crate for a publish, not a verified tar ball, but somewhere up here it did, right? New crate, encodable crate upload somewhere here, create exists. So this is looks up the crate by name to check whether this is a new crate or an update. That's fine. Max upload size, but somewhere here, it must set the version. New version, right? And then add dependencies, update crate, that's updating keywords, verify tar ball, render an upload features here. V, where does V come from? So V is set depending on whether features two is empty or not. So this is something that we're gonna have to encode in our like this kind of logic, which is going from what was in publish dictates which version we're generating. And so this implies then that the V is unrelated to the resolver version and also related in the sense that only with a new resolver would you set features two in the first place, but it's not encoded in the upload information. Okay, so that means that if we go back to the dot crate, the resolver version here is not relevant and we can get rid of it. Sweet. Okay, so we have dot crate and the next step now is the publish. So we're first gonna have to agree on like a how to, how to model these and that's gonna be something along the lines of, I guess, crate version. It could be called publish and we're gonna implement, want to implement from a super dot crate normalized manifest for crate version. And we also probably want to be able to go the other way around. Nope, we can't go the other way around. It's actually, I think it's a one way conversion. And we're gonna have to implement that at some point and crate version, we're gonna have the same kind of thing where we do take a and so now the question becomes what definition of crate version do we want to use? And I think the one where the data structures are the most important is gonna be the next one because that's where I think the definitions in the crate index crate are particularly performance sensitive because there's so many instances of it whereas this publish manifest like there's only really one at a time. So here I think it probably doesn't matter which one we use. I guess we can start with the cargo one and then unify. Oops, that's not what I wanted to do. I was trying to be smart to make a macro and instead I messed it up. That's also not what I wanted to do. There we go. Did I mess up my macro? There we go. I love in macros, they're great. I'm tempted to just say we're not gonna send badges. We're not gonna send badges. So that's gonna be one of these and we're not gonna be fancy. We're just gonna call it dependency and love that. I think you can put 30 borrow here. Oh, I guess not too bad. Okay, so this is gonna be a cow take a stir and 30 borrow, 30 borrow, 30 borrow, 30 borrow, 30 borrow, 30 borrow. And actually this reminds me there's probably a couple of these where we want skip serializing if, skip serializing if, skip serializing if, skip serializing if. I'm probably up here too. So the skip serializing if is just if this thing is none, then don't include the field in the message that you send. Although I don't think we actually wanna do that for the JSON payload because the receiving end might actually require it. It was interesting that it specifically said on these though. So it's crate version. These are all borrowed. Everything is happy. All right, so now we gotta combine this with the stuff from crates.io. So they want custom types for the names. So there's, there are a couple of paths we could take here, right? One is to say that we expect crates.io to adopt this data structure throughout their program for anything that's a crate upload, which would imply that like here, for example, we need to have our own custom type for vetting that or validating that the format of the name is correct. The alternative is that we say we expect them to deserialize into this and then after deserializing into this, they can then fallibly convert into their own type, which has special types for all of these. That might be better rather than us trying to have generics for all of these, is to say they're gonna write a function that's gonna be of the form try from cargo, what do we end up calling this? Cargo index transit publish crate version for encodable crate upload. They're gonna write this. And in there, they're gonna have all of their mappings of names and versions of whatnot to make sure that they conform to whatever requirements that they might have. But I think that feels nicer than us trying to make all of our fields generic over types that they control. So I think actually that's the path I wanna take. Okay, they have default on keywords and categories, which suggests that for keywords and categories, we can have keywords, skip serializing, if vet is empty and we can do the same thing for categories because we know that the crate's IO side has default for them, which also means that we can add default for them. Same thing here, it means that we want default for this and default for this, which also should mean that they have default for those. Oh, they also have default for links. So we'll want to add that here. So what do they have? Oh, interesting, they don't have a default there. I think that's because option is treated specially. Option already basically implies default, I think. So I don't think they would have needed it on links. Default does need to be set on Vex though. We should have tests for this though to make sure that that conversion actually does the same thing that they expected to do. Now dependencyKinds is another one of those where we want to think a little bit about what we want to do here because as far as cargo is concerned, where is this? Depths dependency, dependencyKind is just a string, but as far as crates.io is concerned, it has to have a particular set of values. So I think we actually want to keep that mapping that they have. I don't know why they have the numbers. Oh, that's for mapping them to database IDs, which we don't need. So we don't need the wrapper here. But that way we can say for dependencyKind, so this doesn't need to borrow anymore and this is now dependencyKind. Okay, and the other thing that strikes me here is for encodableCrateVersion, that's a semver version, whereas currently cargo just treats it as a string. So I think here too we're gonna say semverVersion. So we're gonna require that cargo sends it as a semverVersion and that crates.io will receive it as a semverVersion. Now here too, we could have a cow here to say that if you already have the semverVersion somewhere, we're just gonna reuse that one for you, but let's just leave it owned. I'm not too concerned about it in the context of publish. Do we have anything else that's weird in here? So btremap, this seems to be an agreement about features, that's probably because of ordering. So I guess actually what I probably wanna do here is go field by field. EncodableCrateName is a string and we encode it as a string. This we already dealt with, depth is a vector of encodableCrateDependency, encodable, right, a vector of dependency, which is what we have up here too. It is a vector of dependency. So dependency has, let's grab this one. I just wanna hoist it up here to see that I haven't missed any fields or they're encoded differently. Optional is a bool, so that one we've dealt with. DefaultFeatures is also here and it's a bool, that's fine. Name is an encodableCrateName, which we know is just a string, so that's fine. Features is a veck of encodableFeatures, veck of strings, encodableFeature is string, so that's fine. VersionReq is encodableCrateVersionReq and encodableCrateVersionReq is treated as a string by crates.io and versionReq here is also a string. This one's also interesting because here, I think we can do better because we can say that this is a versionReq and actually provide this as a stronger requirement than just a string. Target is a option string, option string. Kind, they have as an option dependency kind, so I guess this is an option of dependency kind, although it's interesting because a cargo treats kind as just a string, it's not an option, it's not optional. So that makes me wonder, cargo and crates.io have this as string, crates.io has this as option, but I don't think it's actually an option. I think it's required. And I guess for version, we should note that cargo has this as string, so that's kind. Registry is an option string, so that's fine. ExplicitNameIntomal, encodableDependencyName, an encodableDependencyName is a string, which is a string here and it's an option. So this explicitNameIntomal is, it's sort of encoding the inverse of what we talked about for package, right? So I gave the example of this. So in the cargo.toml, the structure is name equals, and then stuff package equals something. In the publish metadata, the way we encode this is, dependencyNameNom, dependencyVersion5, explicitNameIntomal for the dependency Nom5. So the mapping is inverted. And we'll see this pretty easily once we go up and define this from. All right, so now it's time to look at these. So we already looked at name and version and depths. Features is a BtreeMap of string to vector of string, encodableFeatureName to encodableFeature, encodableFeatureName to feature, those are all strings. So that matches, description is an option string, that's what we have, homepage is an option string, that's what we have, documentation is an option string, that's what we have. ReadMe and ReadMe file are both option strings, that's what we have. EncodableKeywordList is a VEC encodable keyword, so it's a VEC of string, and keywords is a VEC of string. And they have default, we have default, and we don't include it if it's empty. So that seems fine. Categories, I'm gonna go with this, the same VEC of string. Okay, so these are all fine. License and license file are option strings. Repository is option string, and links is option string with default. It doesn't need default, default is already none. Okay, so now this mod can go away, and now we gotta just implement this bit, which is gonna be, it's gonna emit a crate version. I'm gonna call this M, because it's gonna be tercer. So this is gonna be an M dot, oh, dot crate, all these fields have to be pub, and all the fields here have to be pub, and all the fields here have to be pub. Ah, so many. So this is gonna be M dot name, package dot name. Can't spell. This is gonna be M dot package dot version. Depts, we'll get back to in a second. This is gonna be M dot features. It's gonna be some M dot features. Ah, it's an option here, so this is gonna be unwrap or default, although features and features two. Ah, so this bit we might actually need to... We're gonna have to encode this bit for detecting what should go in features two and what should go in features. This stuff is like magic. That's gonna have to go in our conversion from this into index. That's fine. Authors is gonna be M dot package dot authors, and that's also gonna be unwrap or default. Description is gonna be M dot package dot description. I love macros so much. Now readme is a string or bool here, so we're gonna have to map it. Readme file. Keywords is gonna be an unwrap or default. The same thing here. That's interesting. So readme has some special mapping here, which I think we saw in our prepare for publish. License file. Ah, I'm in the wrong one. I need to look at registry publish. Nope, I need to look at the part of cargo that actually does the publish. Where did it go? I knew I had it here somewhere. Not the package part, but the actual publish part, which I thought was over here where we saw transmit. No, this is the mapping of the manifest. Where on earth did I put transmit? Well, I'm gonna have to go dig up where I had FN transmit. It's over here. Thought I had it open at the tab, but apparently not. Right, so this is the thing that actually generates the dependency list and the, right, so the publish here. So what, you see, for a lot of it, it just copies the stuff out of the manifest. Dependencies is treated specially, features is treated specially, and readme is treated specially. So what is readme here? Readme gets set to exactly what comes out of the manifest. Although that's a string or bool. Oh, right, and a bool here just means readme.md. I'm pretty sure. We could look that up too and say, hey, is there a mention of readme.md here? That's not what I wanted. Is there a mention of that in source? I want the contents readme.md, not files called readme.md. Here, probably. Ah, readme for package. If it is set to true, that means readme.md. So this is gonna be matched on that. If it's some string or bool, bool true, if it's false, then it's none. If it's this, then it's cow, well, I guess we can just do readme.md.into. And none here maps to default readme from package root. I'm curious whether readme, does that even get called here? It doesn't get called here. So something must call it when it generates the manifest metadata somewhere. It's interesting. So none is gonna be this default readme from package root. Which tries all the files to see if they exist. Oh boy. So this one's tricky because it depends on file contents in the .crate file. Because what it gets set to is file system dependent. And I think the license file is the same, actually. Because I think if we look at transmit here, oh, it just complains of the license file doesn't exist, that's fine. But for readme, it actually includes the readme contents, which also depends on the, yeah, so readme file gets set to readme. So actually I lied, it's the, yeah, it's this. And readme gets set to readme, but readme is like the readme contents. And so this is gonna be a little weird to map. We could always just set it to none, right? But really it's supposed to be this. So that's very much of a to-do here. Why does it, oh right. So we're gonna have to figure out what to do about the readme because if all we're given is the parsed manifest, we don't have access to the files that were next to the manifest inside of the .crate tarball. It might be that we just have to say none here. It's interesting too because it feels weird for this logic to live in cargo on publish as opposed to in the thing that renders readsme's. Hmm, okay. And then depths is clearly special because it has all this mapping that happens up here. So let's see, let depths, it's gonna be depths. Let depths equals package of dependencies. So that's gonna be m equals vec. And we're gonna do with capacity, m.dependencies.map, btremap len. So what I really wanna write here, right, is something like plus, plus. That's what I want to write, but it's real ugly. Fine, we'll leave it for now. It's fine. It's fine, it's fine, it's fine. Actually, we can do this in a better way, which is we can do m.dependencies. .flmap.chain, m.dev dependencies of lapmap.d.chain. The reason why it's better to use chain here is because at least in theory, chain should be able to realize what the upper bounds are even about, what the size is of this iterator. And so we shouldn't need to first collect into a vector. .flatmap. Which means I don't even need flatmap, I just need map, which really means it's into iter.map, here too, into iter.map. You'll notice that I do this a lot where I just write the code really far out to the right. And then I just let rustformat make it nicer for me afterwards. There you go. And then I can start actually doing something reasonable with it. So what do we do for these dependencies? Well, it says skip dev dependency without version. I wonder what this is for. Right, so anything that's transitive, which is not gonna matter to us because we're only looking at the things that are in the manifest which are direct dependencies by definition, specified rec, specified rec do. Okay, well, that's unhelpful, specified rec. I see specified rec is just, does it have a version specified? But anything that's in the normalized manifest must already have version specified because anything that doesn't would already have been removed as part of the normalization. So we don't need that part. Registry ID here, the index web API, none means from the same registry whereas in cargo.automal, it means, oh, that's interesting. Okay, so what this is saying is, okay, I'm gonna map all of these in a very simple way to be d dependency kind normal. So it's just gonna be a tuple. This is dev and this is build. And then we can dot map across all of them at the end. So this is gonna be now dependency. It's very unhappy with me. It's fine. One of the found closures. Some things unhappy with me. What's this one of the found closures? Oh, I see mismatched types. Right, so this has to be just like they do over here, a dot collect, that's fine. Now it should stop yelling at me. Okay, so whether it's optional is gonna be D dot. So I wanna hear let D kind is D, which just means I can do D kind here because arguments are patterns. So kind here is now this optional is D dot optional. D here is actually the name in top. Toml and the dependency definition. So explicit name in Toml is name in Toml. So this is where that inversion happens. And in fact, depths here, right, right, right, right. So name here is gonna be D dot package. Default feature is gonna be D default features. Versions rec is gonna be D dot version rec, which is just D dot version, right. If we go back to dot create for dependency, for dependency, version is what gets set into version rec. Target is D dot target. Registry is D dot registry URL, right. What else are we missing here? The features is D dot features. Expected Btree map. Interesting, why I messed up my types here somewhere. Expected Btree map. Oh, it's because this is an option. So unwrap or default on this. Unwrap on the default on that. Unwrap the default on this. Okay, so what do we have here? Is optional. So is optional is defined as inner dot optional. Okay, that's relatively unhelpful. Where does inner dot optional come from? Optionals false, new, parse. So where does the B, B, B, B, B. Serialize dependency, optional. Optional here is just a bool. So where does the default get set? I mean, we know the default is false. So if it's not set, it's false, right. So if optional is not true, then it's false because things are not optional by default. Default features is true by default. Name, okay, so this one's gonna be a little interesting. I guess for default features, it's probably gonna be the same kind of thing. It just delegates to the inner. Yep, so that's relatively uninteresting. Package name is inner dot name. The name of the package that this dependency depends on. Usually this is what we're gonna on the left hand side of the dependency section, but it can also be renamed via the package key. Both of the dependencies below return foo for package name, right. So this is, if the package name is specified, then that is the true name of the dependency. Otherwise, it is name in Toml features. We want a VEC of features. And this is just an unwrap or default. That's easy enough. Version reckless version, target is target. Kind is kind. Target is platform, that's fine. This is where that mapping happens to dependency kind. Registry is depth registry, which we're gonna have a look at. An explicit name in Toml is name in Toml. So this is actually if, you know, this is always the name in Toml, right? Or is, so here's a question. This is supposed to be an option. When is it set to none? It's supposed to name in Toml. Is this like if they differ? If the package key is used, then this returns the same value as name in Toml. Okay, so this is only, this is only set to some if package is set. So this is if, okay, so we can actually make this a little bit nicer for ourselves by saying, let explicit name and name be match name in Toml and d.name, d.package. And then this is now gonna be name. It's gonna be name. This is gonna be explicit name. And here we're gonna say if package is some and name is some, then the explicit name is the name and the name is the package name. If the pack, if the name is set, but package is not, then the explicit name is none and the name is the name that was specified. So to make this a little clearer perhaps, this is a explicit name equals, package equals name. This is a explicit name equals nothing. This is always n. And I think those are just the only two cases, right? Now the question is if you specify, if you use this syntax, but use the same name here and here, should it be some or none? Okay, and then what happens to registry URL? So that's depth registry, which is an interesting mapping. If the dependency is from a different registry, then include registry in the dependency. In the index and web API, none means from the same registry, whereas in cargo Tommel, it means from crates.io, right? So here when we say, when registry index is set to none here, what that means is fetch it from crates.io. If it's set to none in the dependency specification that gets sent to crates.io, it means this is not from a different registry. That's what they're saying here. So the way we're gonna have to structure this is something like, let's see, depth is to do for now, registry depth, which is to say, we're gonna match on d.registry. If it's at the sum, then that means not necessarily from crates.io. If it's at the none, that means from crates.io, right? And then this is where it gets complicated, which is if it's from crates.io, then we actually need to, okay, so here we actually need to know, this is annoying. So this conversion can't quite be this simple because the JSON payload we send here differs depending on what registry you're sending it to. If you're sending it to crates.io, then crates.io dependencies will have none set as their registry. That's awful. That makes me very sad. So this isn't a pure conversion. All right, so that means this is gonna be something like simple this, and we're gonna have to take a is for, and that's gonna be the registry ID, which if it's crates.io, URL is, oh no, I lost my place in the file. This file is too big. Somewhere over here. Yes, okay, I found it. Let me do this. So what I want here is for this thing, this then uses the URL of this, which is this URL, which is this URL. So this then is gonna be, is from here. Actually, this can just be this. Then are is from target registry. So this is gonna be target registry dependent source registry. Target registry dependent source registry. It's a really weird field. It's gonna be if is from is equal for is, is equal to is for, then none else sum is from target registry dependent source registry. Right, and this then is gonna be Cal borrowed. That does make me really sad that it's not a straight up conversion. I still don't know what to do about the read me actually, because like this is also a to do, right? So I can match on read me, read me contents. I don't understand. Why is it complaining about these? Oh, and stringer bool in dot crate needs to be public. So we're gonna do pub use deaser stringer bool. Expected option found. Wait, this should be fine. To do matches every type. Oh, also it looks like I don't need Chrono anymore. That's nice. But what if I swap these? Well, why is it complaining that the match arms of different types? This to do should match that none. Oh, I wonder if it's... Okay, so let's read me options. Yeah, it's probably this. So I need something like a cow to gaster. No, it still won't let me do it. Type option that expected enum option. But the never type should be compatible with option. Interesting. What if I do like to do, to do? Is it okay with that? No? Ah, sigh. That's interesting. I don't know why it won't let me compile this code. I guess I can just do this. And then do like this. It's not what I want, but... See, this is the reason why I didn't want it. Just because I then, ah, fine. Okay, how about if I do this? Is it gonna let me do readreadme.md? If I make this not an option. What, that's fine? And also, what do you mean that's the non-exhaustive pattern? Some string. Oh, right, of course. String path. Cannot return value referencing local data D registry index. Right, because we're consuming this one. But, so the reason this gets weird is because registry index here should be a, it's a caster, but it should have the lifetime of take a already, because that's what we're both giving in and giving back. I think that's a lot. Oh, I guess it's because it's not borrowed as to be taken ownership of. And then this can be this. And this is gonna be cow borrowed. Compiles. Maybe we still don't know what to do about the readme, but I'm gonna go ahead and ignore that for now because I like ignoring things. Okay, let's move to index. Oh, so index is gonna be probably the biggest pain here because of all the optimizations that we've said. I'm inclined to start with the cargo type here because ultimately if you stick something in the index, the cargo doesn't understand, then people aren't relying on it because they wouldn't be able to build it. So the cargo one must be right here, at least when it comes to deserialization. So that's what I wanna start with. Now, the intern string here. So the problem with something like this here is that true, it means that you can continue to reference what's in the, you know, JSON byte buffer that you originally parsed. The downside is that might go away, right? Imagine you're writing the resolver and as part of it, you're reading a bunch of index files. You're probably not keeping all of the red files in memory as you go. You're gonna read a file, parse it and then drop the file, but still want to keep the index reference. So at that point, like you're gonna have to turn this into an owned version of itself. But at that point, you're just allocating a string, which is what we want intern string for. So the question then becomes, how do we wanna represent this? This is also the place where we have the most of like the small stir stuff that is in in the crates index crate. Here it is kind of tempting to make basically all of them be generic. Now this question of should they all require it to be the same generic or can they be different? I think they might have to be different because this is another case, right? Where like the version here really is supposed to be a semver version, but in the crates index case, I don't think they actually want to store semver versions because they are relatively large, right? Like they're a vector of comparisons which are themselves not tiny. I guess a vector is small, but small stir probably does pretty well actually with semantic versioning requirements because very often they're very short strings. So this does feel a lot like a case of let every use of this dictate its own type for most of the types. So it's also very interesting to me that the cargo uses interned strings for all the features, whereas like crates index does not, you know, it uses just straight up strings. It might be because in cargo, the crate name and the crate features end up being like copied in a bunch of places by the resolver because it needs to keep track of what features it's enabled and everything. So they actually get cloned a lot whereas something like the checksum, for example, does not. It gets read and checked once and then it basically gets ignored from that point forward. So my suspicion here is that it makes a lot of sense for cargo to treat features as interned strings and it doesn't really make sense for crates index because it's already behind like a map abstraction. So it's already small. I think the way I want to structure this is start from the cargo one and I'm gonna say that we have a couple of generics. We have name, version and features and then I'm gonna disagree that this is a string. I'm gonna agree with crates index that this is a checksum that is hex encoded. Yanked is an option, bool. Yeah, this doesn't need to be an option and even if it did, it can just have default set. It's interesting that links is treated separately. What does crates index encode links as? It's a small string, yeah. So I think links then ends up being separate again. It's tempting to make it similar to features. I don't think there's a strong reason not to but like in cargo, for example, cargo uses the same for, I guess, cargo uses the same for them. Ah, but crates index does not use the same thing for links as it uses for features. It actually uses the same thing it uses for name which cargo also does but tying them to name seems weird. So let's do links and say that links here is gonna be this, okay. And then the next question is for features, it does need to be a B tree map. I don't think I agree with crates index which says that it's a hash map but crates index wraps this behind an option box to make it smaller. So one question here is whether these optimizations also make sense for cargo, right? Like cargo is not gonna complain if things get faster. So we don't need to have an exact mapping to the way that cargo holds this data at the moment. To reduce the size and start from the field is unused. That is almost always. I don't think it's true that it's almost always unused anymore because features two, oh right, features two is only for things that newer cargo understand. So if we go back to the crates IO code here, features two is anything that is a weak dependency and there are relatively few weak dependencies. So usually the list of things that goes into features two is empty, right? So I think I agree with this part that features two can probably be an option and making it a box, it's wrapped in a box to reduce size. So we'll do, I think we'll do this. I think I like that idea. And then for features, it uses arc. I'm curious why it uses arc for features. I mean, I guess arc and clone are basically equivalent really in this kind of use. The question is how often do you wanna clone the set of features? I wonder if there's an explanation for this. Like if we go to rust crates index and we look at source lib, I wanna look at the blame for features. This seems to be the thing that moved everything to have arcs. Lower memory usage by de-duplicating versions data. What confuses me here is where does the de-duplication? Oh, I see, that's sneaky. Okay, so this is primarily useful for crates index, which is when it parses the list of versions, the observation here is that for a lot of new crate versions, the list of features is exactly the same. So rather than store them multiple times, you just reference counted, just store the reference counted one multiple times. I totally buy that. I totally buy that that's useful and arguably for the dependency list too. So here's what I wanna do here. I wanna say that it seems totally reasonable for this to be an arc and it seems totally reasonable for this to be an arc. These are arc because of... These are arc, they can be de-duplicated easily by calling code if happened to be reading in all of the versions of a single crate at once as versions often share dependency and feature as similar versions, nearby versions, where name implements serialize and deserialize, version, I guess feature is really the... And links, a feature, also have to implement ord. Feature, feature. And this is just, whoa, use standard from debug. Interesting. Oh, why arc and not RC? Arc is just more useful than RC. Like if it's RC, then there's no way for people to make it thread safe on their own. But if it's arc, you're paying a little bit more cost if you're in a single-threaded context, but if you're in a single-threaded context, it probably doesn't matter to you anyway. I forgot about this. So this is where we also need serdeh, oh boy. Sorry, this is gonna be bright. So serdeh has a special thing that you can use if you want to use generics in your type, which is what we want here. It's the bound thing. But that's not what I wanted. I wanted, I was hoping those away from me to just say also include this lifetime, but I guess not. Don't really wanna have to rewrite the bounds myself. All right, I'll leave that for in a second. Okay, so name and version are configurable, dependencies, features, features two doesn't need the deduplication, so we keep it as a box so that we have ownership of it. Check some, yanked, links, schema version. This really doesn't need to be a U32. This can be a U8. What else this creates index has is an arc over dependency such that it is not a VEC. It is a slice, that seems fine because these dependencies don't need to grow and shrink. It doesn't actually need to be a vector. And that saves you a little bit of space. That seems reasonable to me. It's interesting to me that this is an option box because for links, I don't think we need to box here because we can always say that for, when crates index uses our crate, they can just set links to be box molster. So that box probably isn't necessary. What do they have for V? They don't even decode V, which is interesting. Okay, so that can go away now. And for dependency, serialize, deserialize, debug. There's an argument here that this no longer needs to be generic over A, actually, because now all the relevant types are controlled by the caller anyway. Right, those bounds are generated by 30 anyway. So that's fine. This doesn't need to borrow. This doesn't need to, that needs to be generic probably over name, probably over feature. Because you'll want the same string representation in here. So what do we have, what do they use down here? Right, dependency kind we already have. So we have dependency kind. Does it need to be an option? This is super dot create dependent publish dependency kind. They use for, it's interesting. So for the inner features, they use a different representation than for the outer ones. Box, box, slice. Yes, I mean, this is all optimized for compactness over performance. But it's because usually, you have a bunch of these fields you're deserializing, but a lot of them you're not even gonna look at. And so you would rather them not take up a lot of space so you can walk the memory more efficiently, for example, then you want to have the maximal performance when accessing any given field because you're gonna be accessing relatively few things. So, oh, public is here. So I suppose that means I lied when I previously said that dependencies weren't public. That's frustrating because it does show up in the next right here. So that means the dependency here, which we got over from create SIO in dot create, which is manifest Toml. All dependencies are private by default. Ah, Toml manifests. Dependency. So where do we have Toml dependency? Detailed Toml dependency. Right, public here is a, no, pub public option bool. And I think that can, does that even go into new create? That's the other thing I want to know. New create. Yeah, see, so it's not currently even sent as part of the, it's not published, but it does go in the index, which is interesting. So it's not a part of publish, but it is a part of dot create. Public is self dot public dot and in publish it just ignores that because it doesn't go anywhere. And then for index, right. So now we're back to index. I'm torn here what to do with, I think this can certainly be box list of feature. Not sure why it's box box. It's box box because this is a slice, which means it's a dynamically sized type, which means that box here is actually a fat pointer that holds both the pointer and the length of the slice, which means that this is two U sizes. Whereas a box box is one U size large. But that could be remedied by doing this. That has the same effect. Although the VEC here, I mean, we can do the computation, right? So a VEC of feature is a U size for the length, a U size for the capacity and a U size for the pointer. A box VEC feature is a U size for the pointer, only a box of slice feature is a U size for the pointer and a U size for the length. And a box box feature is a U size for the pointer only. So box VEC feature and box box slice feature are the same size, but inside of the pointer here, this is a U size pointer and a U size length. This is a U size pointer and a U size length and a U size capacity. So the hence box box feature, right? So this one uses less heap memory than this one does. And there's an argument here for that means this should be the same. I'm curious what they do for, I don't understand why I removed that original code from registry index. That is over here. Not dependency, but the top level definition. They still use VEC strings for the features inside of here. I guess it's because once you arc them, they're less costly, but you could trim some by doing this, but maybe it's just not worth it. Like if you really wanna trim here, you can do that. Okay, so back to this. So now there's a question of for requirements, they actually do need to have a different type than the version because one is a semantic version, one is a semantic version, one is a semantic version of requirement and they're just not the same. So for rec, I'm gonna take rec here and this is gonna be rec. And I'm guessing for target, and so for target and for package, for both of them they're using box of small string. Now I don't think small string helps you here because package and target are both usually fairly long. Are they longer than a U size? So target is usually something like x86, 64, unknown Linux GNU. A U size is 64 bits, which is eight bytes, which is roughly eight characters. So one, two, three, four, five, six, seven, eight, right? So that's not gonna capture your target. And same thing for package, package is gonna be the name of the crate. So that one might, like if it's nom, for example, if it find into small string, but I guess for package it might make sense. Like small string might be able to save you some of the time. For target, I don't think it matters. So this now gets back to what extent do we want these to be, things like the features to be tied back to the input source. And I think actually cargo has the same need here as crates index diff, which is the input that you have, the JSON input is not gonna be long lived enough that it makes, that you get any value from having this tie back to the lifetime of the JSON file. And so as a result, I don't think we actually need to worry about that. I don't think there's a lot of value here in being able to do input variant or these serialization that's not owned because usually you want this to be longer lived anyway. So in that case, the target here, target I think might as well just be, I mean, I guess it could be a box stir, right? So this again saves one U size compared to string and we could make the argument that this is, target is set rarely enough that you want to box box it because you want to save that extra byte anytime it is set, anytime it isn't set, which I'm actually surprised given crates index is optimization here, target feels like a good candidate for an extra boxing. Now package, package is set frequently, is not set very frequently either. It also feels like it should be tied to name. The question is whether we should box it because it's used rarely. Sure, why not? Now this feature list is interesting because I guess the question is how often do people pass features for their dependencies and I think that's decently common but again, it's about having the most compact representation for that feature list depending on how often you think you're actually gonna access it. I think this is probably fine. Now for targets, do we wanna allow people to swap in their own versions here? Do you feel like this actually in, certainly in cargo. This might make sense as an intern string because there are very few targets. So I think we're gonna make this be generic. So that means there's now also target, not target, target. And so target goes into here, target goes into there, to target went there. Registry, this one I think is very rarely. This one's very, very rarely set because usually the dependencies of a thing in a registry are from the same registry. It's very rare they're gonna point anywhere else. I wonder whether they even bother to deserialize it here. Yeah, they don't even bother deserializing it because it's just, it's so rarely set. And in fact, I forget whether crates.io even allows you to have cross registry dependencies. But this certainly feels like one of those, like we should at most be spending a U size on it. I wish we could spend less, but I don't think we can afford to go quite as far as crates index does because we do actually need to store this information because if it's set, it's relevant. Okay, so we have option dependency kind. I don't actually think this needs to be an option, but it doesn't matter. It'll get a niche optimization anyway because there are fewer than 256 variants of this enum. Okay. And public, what did they say? Public defaults to somewhere over here. Too many tabs now. Public defaults to false. So that means this can be default, sturdy default. Or actually, I guess that means in the dot create is really where that goes. In fact, yeah, so public here, I guess this is reading things out of the index. This is saying if it's not set in the index, then assume that it's false. So in the index, it would still be an option because you care whether it's explicitly set to false or explicitly set to true because we don't know the default in cargo might actually change. Okay, so now next question is, right, so these for dependency kind, I guess we might as well be nice to people and bring over some docs here. And we can also be even better and implement all the different nice things. This skips serializing on package, kind and target. Package, kind and target. In which case, that would mean that we also need to derive default for them. Although I don't think there's actually a meaningful default for kind, but if it's not set, then we wanna preserve the fact that it's not set. Okay, so now I think we're aligned with crates index. So they could switch to ours now. They have all the information. The next is checking that this all still works with what crates IO needs. So this is the definition from crates IO. Oops, let's see that that matches up with this. So create name, they use a string, they use a string for the version. Both those are fine, we allow them to be generic over them and they can be the same or they can be different. Vec of dependency, we encode that as an arc of a slice of dependencies, but that should be fine for them. We'll check whether the actual registries are the same, dependency types are defined later. For checksum, they store it as a string. We actually parse it as a U832, which should be fine. It does mean that for them to get it out as a string, it would have to be parsed out, but that's probably fine. I'm intrigued by the decision to store this as 32 U8s because this feels like a thing you might want to box because then you're storing a lot less for each one, but you are also using a bunch of memory for every field. Just like 32 bytes is like a non-trivial amount of storage. So the checksum that seems fine features, they store as a map from string to string. We store it as an arc map and they can choose string to string, so that's fine. This bit, let's add, oh, they've actually copy pasted exactly the definition from cargo. So that's nice and those are the same. It's an option. It's just that for us, it's an option box, which seems fine. Yanked for them is an option. We've made it a bool that implements default because if the field is missing from the index, the field should always be in the index and if it's not in the index, you should interpret it as not being yanked. So I think that's right. Links, they have as an option string. We have it as an option links so they can set that to string, that's fine. They have skip serialize on it, which we do not, but we should have that because that is the behavior of crates.io when it generates the index as we just saw and so therefore we should behave the same. The schema version, this is just copy pasted. They use a U32, we use a U8, seems fine and we wanna skip that if it's set to none and that's all there is to create. And now let's compare their definition of a dependency to ours, name is name, rec is string for them, that's fine, those match for features, they use a VEC of string, we use a box box string of feature, which is fine, those are compatible. Optional bool, that's fine, default features, that's fine. Target for them is an option string for option, it's an option, for us it's an option box target so they could set target to just be stir and be happy. Kind for them is an option dependency kind and for us it's an option dependency kind, that's fine and we have package, which they also have, we have default, which we don't need. Did that come from the definition in here? Did they have default set for package? They, no, so I just made that up. Okay, great, so that one's good, done. They have partial eek and eek set, that seems fine, we can have partial eek and eek. Box slice does not implement serialize, we'll have to look at that in a second. Now, they also have an implementation of partial or for dependency, which seems useful and they have this, which looks like it's mostly the same except they have ORD, they don't have hash, we'll add hash. So that's the union of them. Okay, so partial or for registry dependency, so we want to here do where, where, name impulse ord and wreck impulse ord. Because those are the only things we're actually comparing, interesting. Why is the requirement on eek here? All right, let's see what cargo checks us. I want something more substantial here. Whoa, a lot of stuff. Can't compare feature with feature. Partial ord, why does it need to compare feature with feature? I forget whether ord, right, ord implies eek. And up here, by deriving partial eek and eek, we're getting an implicit dependency that features impulse eek because otherwise it can't do the comparison here. So it is interesting that ordering doesn't look at features at all. An old cargo version of the dependency order appears to matter if the same dependencies is twice with different kind fields. The option of getting something to be ignored. I see, so this is a partial ordering. So they're actually kind of lying here. Like this is not really okay. Because I think all the other fields also need to be ordered by. But name, kind and wreck, name, kind and wreck. I think all you need to do is this ord because then we can take this and say this. By placing the fields in this order, we ensure that the same, that by placing the fields in this order, we ensure that we get the right. We ensure that we sort by kind before version. Now we ensure that normal dependency is always first when multiple by the same name exists. And that is assuming that in dependency kind we have normal defined first, which we do. And this stems from the fact that the derive for ordering is order by the fields in the order that they appear. And I believe that's not a thing they can change. I mean, who knows, but at least in theory, that shouldn't be a thing they can change. I mean, those should be easy enough to test this. Like if we do, you know, derive partial ord, ord debug. And I do struct s and we have field zero, which is U size and field one, which is U size and field two, which is a U size. Then if I do this and I do print line, I guess mute v is a vac of things, v I'm gonna sort. And then I'm gonna print out v. Then if I now do something like s, let me make these a little more convenient for myself. F1, F2, F0 is zero, F1 is one, F2 is two. Well, that was silly. Let me close the door a little more. So we would expect here, I guess I can make this a test instead. Okay, or the good old, it works. So what we expect to see is that it sorts by field zero first. So if I do five, four, three, two, one, I do one, two, three, four, five. Zero, one, two, three, four, five. I do, I don't know, one, two, one, one, two. Then I'm expecting that this is equal to, I forget whether default sorting is ascending or descending. Can't compare s with s. That's my ick, partial ick, tools must format. Test run failed. Okay, so it's ascending by default. Oh, this is awful. So this should come before this, should come before this, should come before this. Me, so did you want to come in? Well, hi, Chai. I'm sorry, did I lock you out? I'm sorry. You gonna grumble now? Okay, here you go. So zero, this goes up here. So zero, one, two, three, four, five. Okay, so that test passes. It could be non-deterministic, but it seems to be ordering by the fields in order when you derive word. Now, can we rely on that? Who knows, but I think we can. Or rather, if they were to try to change this, I think they'd run into problem. I think the guarantee here is that the ordering of the fields is equivalent to as if you had a tuple of those fields in that same order, which means that just by placing them in this order, we guarantee that all the normal dependencies come first regardless of the version numbers. And now this is complaining about something. It's complaining that, well, first of all, it's complaining about a bunch of unused dependencies or uses like this. Does it still work if you swap the ordering? That's a great question. So if I do this and put F1 there and run test, nope, test fails. Cause then now it's ordering by F1. DC realized is not implemented for box slice. Interesting. Is there another feature of CERDI I have to turn on here? Cause again, here, that derive seems to work just fine. So if I go over here and look at the cargo tumble, their dependency on CERDI is just RC. So why don't I get to do the fun thing? They have arc of slice of dependency. I have arc of slice of generic dependency, but I don't get to have mine. Why? Rec feature target. Oh, the docs on Rust stable says that derive word works like that. Okay, great. They don't have a manual derive, right? They don't have a manual implementation. They just derive serialize. And it's the same for dependency for them, but theirs isn't generic. So that's where I feel like this probably comes from. I wonder if it's rec feature target links. So I should only need feature ord. Cause that's the only bound I have that the derive wouldn't realize. I'm pretty sure that this is CERDI getting confused. So if I do CERDI bound equals this, it's gonna complain about all sorts of things, right? And it's gonna complain about name, DC realize. Can I keep them separate? Oh, that's awful. That's awful. So I'm gonna have to do like this. Can I avoid giving them all as one big string? So I'm gonna have to do this. And I think I know why this is too, but I'll, let me just see that it works first. So this is gonna be version. Can I at least do this? I should be able to, cause it's just gonna be injected verbatim. So name, version, rec feature target. Links, gonna let me do that. I'm not sure why it won't let me do this. So the reason why it's complaining here is because I think it's only because of rec and target. So in particular, I think I can fix this by saying unused one is phantom data rec and unused two is phantom data target. Really? Name, version, rec feature. Oh, interesting. So it's not that. Why? Oh man, it's very strange. Oh, I also just realized all of these have to be pub. Surty, rename equals version. Make these nice to work with dependencies. I think this is also for what is worth the reason why crates index exposes these only through getters and setters is because that way they can hide the fact that behind the scenes is like an arc and a double box and stuff behind this, right? So it is tempting for us to do the same thing. But I'm going to make it pub for now. Let's check some links. And this is pub v. And I guess we can rename this to be, it's like schema, oh well, it's v, but it's schema version. And then this, oh, is it because I didn't make it pub? This is pub, right? Yeah, pub. And then this is pub, pub, rename, pub features, promotional target, registry, package and public. What's interesting is it says the trade bind box isn't implemented. So my guess is that Surty's derived for arc depends on Surty's derived for box. And Surty's derived from box, I feel like it just doesn't handle the generics, but why not? Box of that to serialize is not satisfied. I mean, I guess I could do this. David Tolne would scream at me if he saw this. Yeah, I know, I know, I know. So this, okay, so there's a line in the Surty docs that says, these here, I said lifetimes. I thought there was a line here that said something about, like if you ever put four next to your thing, you're doing it wrong. Actually, you know what? I wonder whether stupid as it may seem, I need Surty borrow here or something along those lines because I might have to tell it that this is, this isn't about the lifetimes either. Yeah, I'm unsure why this wouldn't implement DC or LICE. Because it should be the case that this bound should be like inferred by Surty here. I guess like this is a trivial thing to test, right? We write a test that we can deserialize this when the type instances are reasonable. Okay, let's just assume that's right for now to see if we can make some progress. So now thing we wanna do is we want to implement a conversion from new manifest to this and from the publish metadata to this. And it'll be interesting actually to see whether we can. So if everything works the way we hope, we should be able to implement from publish create version for whatever we call this thing, which is registry package, which is, I don't really love entry for entry. And I guess this is gonna take all the same generics as over here where name is gonna have to implement like from cow tick aster. And same thing for version. And same thing for all the other ones actually. Rec, feature, target and links. And feature also has to implement ord. Oh, cargo expand isn't a good idea. That might actually tell us what's going wrong. So the name is gonna be v.name.intu, v.version.intu. Features and features too. So that's the part that we were looking at over here is v.features into it or schema version. So that's schema version dealt with and features to dealt with. It's very unhappy with me with the bounce up here. We'll get to that in a second. So this should just be features. This should be v.... Ah, so the checksum we don't have because that's not in the create version stuff. Yanked is though. No, yanked is also not in that. If you have a create version, the assumption is always that it is not yanked because yank is an operation on the registry. So anytime someone publishes you a create version, it is not yanked because it was just published. And links is gonna be v.links.map. And really this should be into name. Generally, you're supposed to use into for bounce. This could also have been a mechanical change, I suppose. So that's features. So checksum we can't dependencies we should be able to. So actually here I'm gonna go back here and say that we're gonna make this also be nicer. Rename equals verge. And then say this is version. Rename equals depths. And say that this is dependencies. And then this is gonna be v.dependencies.map, right? And this is gonna be an arc new this.map, whatever it ends up being, .collect. And this is gonna have to be a registry dependency. And let's see if we can come up with all the fields that go in here based on what's in the publish that we got. Well, this should be D dot name. This should be D dot kind. This should be D dot version. Although create version dependencies. So for dependencies down here, this is version rec currently. Let's make that nicer too. Rename. I'm surprised that this is named version underscore rec. Is that right? It really is, wow, okay. So that's gonna be requirements. Right, and in the index. So here we have to invert again, right? Because this wants package and name and doesn't want the explicit name and toml bit. So back to another map like this. We're gonna say let name and package is match on D dot name and D dot explicit name in toml. So this is gonna be either name and sum. So let's see. So it's either that or it is this. So if there's no explicit name, then the name is N and the package is none. And if there's a name and explicit name, then the name is the thing in the explicit toml and the package is the thing that was the name. Nice. So that's name and package dealt with. Now registry here, the question now is whether we also need to invert registry. I honestly have no idea. What is, what does it stick inside of that? So it's the version string, check some features and features two, which we already dealt with. Yanked is false. All right, we already, this is the outer fields. So Yanked is false, links we dealt with, features we dealt with, great. So for the dependency stuff, that mapping happens. Git depths. Git depths is add dependencies. What does add dependencies do? It fetches all the stuff from the database. All right, so this is the same inversion we do. We're generating a dependency. That's just the version requirement from the thing that came in. The feature list is just the feature list mapped to strings which is already what we have. Optional is just d.optional. Default features is d.default features. Target is d.target. Registry, registry doesn't even get set here. So this suggests that crates.io just does not allow cross registry dependencies because in its index entries, it never writes out registry. So presumably then in the upload path somewhere around here, I wonder if it looks through the dependency list and checks that none of them have registry set or maybe just assumes that they don't. So whenever it comes back from add dependencies. Yeah, here, dependencies hosted on another registry cross registry dependencies are not permitted on crates.io. Which is a little awkward for us because it means we don't really know what this mapping should be. It's not clear whether cargo expects the thing that's in the index to remember how there's a difference between how it's treated in the cargo to toml dependency definition and how it's defined in the metadata that we send in the JSON payload. In one, the registry isn't set if it's crates.io. In the other, it isn't set if it's the current registry. I feel like it's probably the current registry that applies here too, which is the same thing that's in D because that's what the registry received and public is probably just D.public. And now it's complaining and it's complaining about the same thing. Implicitly alighted, oh, the trait box is not satisfied. DC realized it's not implemented for box of that. I wanna know why, oh, I haven't called intu on all of these, have I? Right, right, right, right, right. So I'm gonna have to do, oh boy. Yeah, all of these are gonna need intu calls. We'll deal with that in a second. So, yeah, this all comes back to this bit. So if I run counter, go expand, what do I get? Oh boy, this is long, long, long, long, long, long, I need to know what to search for because otherwise it's gonna be impossible to deal with this for entry. All right, impulse serialized for entry, impulse deserialized for entry. So it correctly adds the DE bound to all of the generics. Now where's the part where deserializes the field that we worry about? Which is field number, zero, one, two. Okay, so field two, where does it handle field two? Depth is field two, field one, field two. Next element, it's an arc of that. Okay, so I guess then we need to look at the implementation for registry dependency. That also seems reasonable. It implements deserialize when all of its generic arguments implement deserialize over the target DE. Why is there a requirement on target to be default? I wanna know what this default business is. Oh, no default. I don't know if that did it, but I think that might actually have done it. Though there was a requirement, the default ended up meaning that target had to implement default, but we don't require that here. And it's a serde bound, right, serde default. So it's not a problem for Rust itself that target didn't implement default here even though we contained one. But for the implementation of deserialize, serde thought it was. And I guess this means that when you declare default on something that's an option, it assumes that it should be some of default of the target type rather than none. That's weird. All right, well, I guess that fixed it. Cargo expand to the rescue, I suppose. All right, what else do we have? Uh, buh, buh, buh, buh, buh. Version from version is not satisfied. Oh, actually, I guess there's no need for the into because well, so this is weird. I guess the conversion here is just this, right? All of these are just this. There's no generics involved because we're just saying we can construct a crate version from an entry. If you have a crate version, we can construct an entry where all of the things are just the same kinds of references you had in your crate version JSON. If you wanted to convert those to more special types for yourself, that's fine. But for the conversion, like the, we want the one-to-one conversion. Found version expected caster. Oh, right, name version. So the thing that comes out of version here is a Sember version. And the thing that comes out of requirements is a Sember version wreck. Dependencies is not an iterator. Sure it is. Arc new size for values. This is going to be a collect.intubox slice. Boxed slice. And then this is going to be an arc from. So the kind here, that's going to map to a sum. The features here, right, we're insane people and decided to have this be intuboxed slice boxed slice. Box new target map box new map box my line. Right. This is double boxed for reasons. In fact, this one is a two box string to boxed stir a map. So we're going to map the registry to a boxed stir like this. And this is intubowned.tuboxster. So this gives us a string because our here is a Cousers. This gives us a string and then this is.tuboxster. So this gives us a string because our here is a Cousers. This gives us a string and then this is.tuboxster. This is definitely supposed to be a boxed stir. Yeah, string in intuboxster. Okay. And then we box that again. That seems fine. Package is supposed to be an option of a box of a name. Ah, because usually this is going to be nothing. Okay. Public, right. Public is not currently in the JSON and therefore it has to be none. Even if it might be some in the original dot crate, it's not a part of the JSON upload fields. So features, this is where we need to do our fancy mapping, which is just arc features is arc new features. And features two is features two dot map. Box new. And the checksum we still don't have. So this is where this conversion doesn't work, which is we can only do this conversion if you also give us the checksum. So we're going to have to do this. And this is going to be a pubfn new, which takes a crate version and a checksum. And the checksum is going to be a u832, like so. All right, so now we have that conversion. And that means we have from dot crate to publish to index. Now in theory, we could also define a conversion, I think, from dot crate directly to entry. So this is from publish and then we could also have a from manifest, which could be useful for things like the internal testing and cargo, right? From manifest, which is a dot crate normalize manifest. And that also needs to check some. And it's going to have a similar kind of thing, where we do this. Now this is going to have v dot package dot name, v dot package dot version. We no longer need to do this mapping because here the mapping is direct. So name is just going to be v dot, no, d dot name, name and dependency. So that's going to be name. And package is going to be d dot package. What is it complaining about here? Expect this direct DTRI map. Right, that's because this is an option and wrapper default. Ah, right, so this here is where we need to do the same chaining that we did over in publish because dev dependencies are different from build dependencies are different from regular dependencies. So we could have a helper for this instead, which says, over here that impel this dependencies. Dependencies. This is where we'd really nice to have partial borrows because so we can even have, you can either have this take a reference to self in which case it can't actually take out the elements. It could consume self, but then you can't consume anything else from self. What we really want to do is say, this is only going to consume the fields dependencies, build dependencies and dev dependencies. We can't, we don't actually have a mechanism for doing so. But what we can do is take dependencies and say that that is going to give you a impel iterator. Item is going to be, I guess, cow tick a stir, which is the name of the package. A dependency, which is a tick a, and a dependency kind. That's what the iterator of this is going to give you. And it's going to return this, sales.dependencies.take, unwrapper default, self.devdependencies.take.unwrapperdefault, self.builddependencies.take.unwrapperdefault. This is going to be super publish dependency kind, name, intoml, d, kind. So we're going to map that to, like so. So now we have a thing that when you call this, it's going to sort of steal all the dependencies and give us back this one chain dependency list instead, which now we should be able to make use of in publish. This should be able to do m.take dependencies and stick all those into here. And I guess I need to make this pubcrate. This is now requirements. And then we collect that great. So now we can reuse that same method over in index, which is it is going to say, let dependencies is v.takedependencies.manage, like so, and I guess we can do arc from that as well, if we really wanted to. And then this is just going to be dependencies. And then we avoid duplicating all that stuff. This is now going to be name, d, and kind. Kind, this is version. I guess optional is called something else over here because of course it is. So in dot create, what is it optional dependency called? No, it's called optional. Ah, but here it is an option bool and here it is a actual bool. So in the index, the optional field isn't optional, which means unwrap or false. Things are not optional by default. And this is unwrap or true because default features are on by default. Here, this is another place where we have some logic and publish for remapping the registry. Ah, so this is where we're going to need, this is the really weird part, which is this conversion needs to know which registry are you generating this index entry on behalf of in order to figure out what to put in the registry field. So via registry, which is a stir. So it's actually a little tempting here to run this logic via the conversion of publish. The reason I'm not doing that is because there's some information that is only in the manifest that isn't in the publish info, namely the public private thing, but maybe it's just not worth doing this conversion this way because we're duplicating a bunch of business logic here, like the defaults for optional and default features. This is kind of clunky, although at the same time it is very nice to be able to do it this way. I guess, you know, the thing, actually, you know what we do here, I know what we do here. We do this. We say in registry is super publish create version new, v and check, v and via registry. And then we return self from publish in registry. Ah-ha, we can even do, here's the tricky part, which is the entry is this. And then we actually get to carry over over info that's not present in publish representation, which is then entry dot, oh, this is also gonna be annoying because really what we wanna say here is like for every dependency we wanna go through and say whether or not it is public, depending on whether it was public in the original manifest, which is gonna be something like uh, depths is entry dot arc get mute this expect. We haven't shared the entry yet. So that is a, this is now a mutable reference to the list of dependencies. And then now we should be able to walk the list of dependencies which is probably gonna be best done here. And then we're gonna match on the kind is always set by create version new, which I believe to be true, right? And we always set the kind here. The kind comes from the kind here. Actually, why is this even an option? This isn't an option. But in the index, it's an option. But we always set kind here to be some. Okay, so that means it's always some here. So this is either gonna be in v dot, oh, balls, but that's consumed. So what I was going to do was this. Right, and then we could walk. We could do orig depth list. And these are all, I think, as rough. It's like if let some orig depth list is orig depth list. And here we could actually use let else. So we can do let some else continue. It's my first use of let else. Expected comma, interesting. But okay, so I need to do, I guess, oh, that's funky. So I need to do like, I guess this is a shortcoming of the syntax. So I do orig depth list, else continue. So that gives me the orig depth list because if it's not in here, that means, or at the same time, doesn't it have to be the case that any dependency that we have here must be in the original? So I think that's true. Expect, yeah, it shouldn't be possible for that to not be the case. Like if you have a dependency listed in the index entry we got, we generated, then that must be because the input had an entry for that dependency listing in the first place. So looking up that dependency entry should never fail. And then we can do orig depth list of depth.name.public. So here's what I wanted to do. Provide the argument. Right, check some. Now the reason this doesn't work and the reason why I think the borrow checker is about to yell at me, right? The reason why the borrow checker is going to yell at me, there we go, is because here we're trying to borrow into V but we gave away ownership of V over here and that consumed the dependency list among other things. So we don't get to do that. I also haven't defined into owned for this but that shouldn't be necessary. So we don't get to do this. We would have to like walk to capture all of these ahead of time or we would have to re-implement the transition logic explicitly which I don't think we wanna do. So we're just gonna not deal with it for now. This is gonna be tick A. This is gonna be tick A and in publish line 61 cannot borrow M as mutable. That's fine. Okay, so now we have the whole pipeline working at least in theory. So we should be able now to do something like take a, ah, what I want to do right is write tests for this using cargo as a library to generate these and using these other crates as well. But if I take a dev dependency on them they can't depend on me. I forget whether cargo allows this. Can you take a dev dependency on something that takes a normal dependency on you? It's not technically a cycle. I forget whether cargo allows this. Path dependencies, multiple locations, target dependencies, dependencies are not propagated to other packages which depend on this package. So this line makes me think that it's okay for me to take a dev dependency on cargo. Even if cargo ends up taking a dependency on me. Now it doesn't explicitly say that cycles are allowed. But like if I do 68, the real question is if I do this, can cargo now depend on me? Oh yeah, I guess I could cargo new lib, ah, cargo index transit, transit. No, no. This is when you don't use the sparse registries. I'm so excited for the sparse registries by default. Or just to have it unstable, I'm so excited. I don't know if I can use cargo add because cargo add might not realize that it's not a cyclical dependency, I'm not sure. But this worked, all targets. I guess I should just override this to be beta too, really. I mean, the resolver didn't complain. That seems like it just works. Sweet, because that means that I should be able to do roundtrip. So I should be able to run, I guess actually I could have done this entirely without taking a library dependency on cargo technically. I could have just run the equivalent of cargo new, like actually do this by command. But setting up the testing harness for this is gonna be kind of annoying. But I guess realistically it would be something like creator, yeah, I want like temporary directories and stuff here, but I also want crates.io equals whatever version crates.io is that, 035, and I want crates index, which is at 019. And I guess the way to go about this is to take a dependency on something like tempter. I can never remember whether it's tempter or temp file. Looks like it's tempter. No, temp file, yes. Okay, great, temp file. Sweet, yes. Okay, so I'm gonna take a dev dependency on temp file equals 3.3.0. And I want tempter, temp file, tempter, unwrap. And then inside of there, I want to run cargo ops new. I hate having to construct cargo configs. Maybe it should be explicitly documented that you can do those cycles. Hey, send a PR to the cargo team. Sure they'll be happy to have PRs that add docs. Yeah, so I guess the way I would do this through new is so constructing a cargo config. So if you use the cargo config crate, you'll find that you end up using the cargo config type a lot. It is the configuration for cargo and it is effectively the result of all of your cargo, all of the cargo configuration files that apply when cargo is used in a particular directory. So you can do just new and when you do new, it is cargo new as of this particular directory. So you pass it something. So there's default and when you construct it with default, what you get back is using the current directory, let cargo do its thing. What I wanna do is I wanna pass in a path instead. So I'm gonna use new instead of default and then you have to pass a shell which I think is just default. And you have to pass the current working directory which is gonna be T and the home directory which is gonna be home but I could just use T here too I suppose. In fact, I'm gonna do t.path.join crate. In fact, I'm just gonna do t.path and t.path.join cargo home. And you might think that once you've done this and now you have a cargo config, but you have not. You also need to configure the cargo config. Why is this not? I guess it's not an unwrap and it's not T, it's D. And this is complaining because it expected a path buff and this needs to be into path buff into, to path buff, I apologize. Now configure takes a bajillion arguments, verbose, false, quiet, false, color, none, frozen, false, lock. So configure is basically the command line arguments to cargo. So new is where are you? And configure are the arguments to cargo as you have deemed them. Offline false, targetter has not been overridden. There are no unstable flags and there is no CLI config, unwrap. And it's complaining about my use of none because it wants a reference to an option. And it is complaining about my verbosity of false because verbosity is a U32. Okay. So now we basically have an environment in which to run cargo. And so now we can run the equivalent of cargo new which is version control, none, kind, new project, kind, blip. Whoa, it's very confused. What does it want here? Use cargo ops, does cargo not expose this? But it's a field in one of its options. The cargo library interface is not always the easiest to work with. Well, how about that? New project, kind, isn't a public type. So I can't construct one of these. It's in the cargo, use cargo ops. Ah, it's not public because the cargo ops, the module for cargo new isn't pub. So the type is pub, the new project, kind type is pub. See right here. But the module it's in is not pub. So this is one of those unreachable pubs. Which means that we're now at the awkward point where I can't call cargo new. So if someone wants to fire up a quick PR to the cargo project to make the cargo new module pub or re-export this type somewhere around here, probably here, just re-export new projects as part of the other re-exports from cargo new, it's a great one-line PR to cargo. Go do it right now. Watching on four PRs in like a second. Well, that's awkward. Can I default default? No! This is the secret way to get around when you run into situations like this. Sometimes that inner type does not expose, implements default. And when it does, you can just default default. But I can't do that either. Can I do this? No, it doesn't let me do that either. Well then, that makes me very sad because I don't want to now run cargo from the command line. I guess what I'll do is just, wait, don't I have a checkout of cargo somewhere? Dev others cargo. Pull cargo Tommel. Let's do a patch creates IO cargo equals path equals home John dev others cargo. Like so. And then we change what file was this source cargo ops mod. And we also expose here new version kind. No, new project kind. And now if I do this, I should be able to do new project kind lib. Ah, no, no patch zero 70. That's because cargo is a couple of versions ahead. That's fine. I suppose I could change this to be stash zero dot 68.0 dash pop. And now this can stay the way it was. Cargo check all targets. That seems like that missing export is a pretty big oversight. If I didn't trust you, I think there's a different approved approach to accomplishing this. How do other projects do this? So, okay. There are a couple of parts to this. Cargo as a library is not planned. It is the things that happen to be asked for by people over time, mostly. So there's a bunch of stuff that like the cargo library API is not really well documented. It's often awkward to work with. And this is stuff that like, if you want to try to make it better, please do it would be fantastic. But it requires a lot of work and cargo does change a lot internally over time. And so keeping that library API stable is tricky and figuring out exactly what to expose. It's not usually like an approved way to do things because the cargo library API is just generally discouraged from use in the first place. I think the general sense from the cargo team is prefer to use the command line tools if you can because those are stable. Don't rely on the library API. I don't want to do that here, but that's the intention. The reason why this particular thing wasn't caught is because the lint to detect unreachable public items isn't documented anywhere. Or I've been streaming for too long. It's not that it's not documented anywhere. It's not on by default. It is allowed by default. Oh, Wei-Hung, thanks. Wei-Hung is gonna fix the problem. One of the cargo maintainers who's in the chat and is just gonna fix it while we're at it, which is fantastic. The unreachable pub lint is not worn by default at the moment. And the reason is because it's kind of incomplete. Sometimes it can be really hard to tell whether something is unreachable by design or not. So whether you should lint it or not, whether it's actually unreachable is complicated to compute. And so the lint is just too, it has too many buggy cases to have it actually worn by default. So it's allowed by default, which it has been for a long time, which means that things like this slip through. Okay. So now we should be able to do this and use cargo ops new project kind. Thank you. And it also wants config, which is easy enough. We have one of those. This is why you named a defenestration. Well, defenestration is named my computer, which I often do when I throw out a window. Autodetect kind. I don't know what autodetect kind is for. False? Autodetect kind. Oh, I see. False. Lib is the kind I want. Path. Absolute path to the directory for the new package. So the question is, does that include the name or does it not? I feel like path is probably expected to be the current directory. If I go back to, nope. Source bin cargo commands new. So path is the argument that you pass to new and that's interesting. So I can start new options. See, I think they're lying. In somewhere here, it says absolute path to the directory for the new package. And I do not think that's true because I think path and based on this, path is the argument you pass to cargo new and you can run cargo new foo. It does not have to be an absolute path. So I think path here is gonna be d.path.join you know, round trip. The name of the crate. And name is gonna be none as in we're allowing that to be inferred from the path. Addition, we're not passing in. We want whatever the recent one is, registry none. We don't want a custom one dot unwrap. Package is gonna be there. Okay, so now I should be able to run cargo ops package. In fact, package one. So now I want a workspace which is gonna be cargo work core workspace. Work space. Aha, find workspace root. No, not find workspace root workspace. The manifest path is gonna be package.join cargo.toml and we're gonna pass in the config and we're gonna unwrap and that's gonna give us a workspace. The only reason I know these by hand is because I've done a lot of using the cargo library API. Nice, we have at least two people writing the PR. Okay, so we have a workspace now. It's complaining about this, that's fine. So once we have a workspace, then we should now be able to pass that workspace and workspace.current, which is just gonna be the main package because there's not really a workspace and packageOpts.unwrap. So that's gonna be the tar ball and I happen to know it's gonna need a second unwrap. The reason for this is because what package one returns is an option file lock and the option here is because if you run package with dash dash list, it'll list the things that would package but it doesn't generate a tar ball. Hence the return here is an option. So you'll see the list option, which we're gonna set to false, config and why that has to be passed in separately here for package one rather than being passed as an argument to package is a little unclear, but it does. Check metadata, we're gonna set to false, allow dirty, we're gonna set to true, verify, we're gonna set to false so it doesn't build, jobs, we're gonna use the default, keep going, we're gonna set to false, so keep going is if the build fails, then keep trying to build to give me more failures. To package is gonna be packages default, which is, this is basically if you were to pass the dash p flag and I just want to not pass the dash p flag, which is default. Targets is gonna be a vac new because I don't have any particular targets and CLI features is just gonna be default, which I thought you could do, but maybe you can't and you all, I don't wanna compile it with all features. Okay, so now we have a tar ball and now we need to grab out the, manifest from the tar ball for verify, yeah, so verify means by default it's on and what verify means is before you run package, try to build the current package to make sure that it can actually be built so that you're not publishing something that other people can't build and the build here is a little special, it runs with your default configuration rather than any custom configuration by done in your home directory, it doesn't respect patches, so it basically tries to do a build as if someone else were doing the build. So it's verify here is should you also do the verify step? Okay, so now we need to extract the cargo tumble from the packaged thing, which we can do by extracting the tar ball, which we already saw an example of that over in crates.io, which was over here somewhere, somewhere over the rainbow, cargo tumble where you at? So they take a dependency on, whoa, it's a lot of stuff, flake2, so these are things to actually extract one of these dot create files. We're also gonna need tar, tar, yarr, and then we can probably just copy paste their code here, which is decoder, read of, tar ball dot path. Oh, actually, I think I can just do tar ball dot file. No, maybe. And then I should be able to do this, and then I should be able to do this. So I'm grabbing out the each entry from the archive, and if the entry.path.endswith cargo.toml, then I want to do the test. I'll get back to do the test in a second. Why, doesn't it give me gcd coder? Please give me gcd coder. Thank you, path is entry.unwrap. All right, so how do they grab the bytes out of this, which is what I really want now? So then we're gonna do let mute manifest, entry.readToString unwrap IO read. So now the interesting part comes, which is we want to do, now we have a manifest. So we should now be able to do use cargo index transit, star, sit. So we should now be able to do let sit.crate normalizeManifest is equal to, oh, here we're gonna need a toml from string of manifest.unwrap expected unit struct. Right, like so, implementation of deserialize is not general enough, scary. Oh, I think actually this is just manifest.parse.unwrap, but fine, we'll do, no, actually I do want toml edit. So why did I even bring toml in? Not sure, I think I only need that as a dev dependency. And then I think I also need it with drive with like 30, yeah, features equals 30, like so, and now we should be able to do adee from stir, hmm, why is it complaining? Must implement deserialize for any lifetime, but it actually implements for that lifetime. Ooh, so this sounds like we missed a borrow somewhere. It's either we missed a borrow, or CERDI isn't correctly propagating the borrows into these subtypes. Let's go look at the CERDI docs to see if I messed this up. Field attribute, CERDI borrow, oh, that should be fine. So did I miss a borrow in dependency? I think I missed a borrow in dependency either. Did I miss a borrow in package? I did, keywords needs to borrow. Would've been so good if that just fixed it. Could also be it's not a missing borrow, but usually it is, but now I can't spot one. Alternatively, it might be something about our little weird string or bool thing. So first and foremost, this should be this, really. Not that it really matters, but that was on read me. So this needs to be dot into owned. So I need a impole string or bool. So this just maps to the same bool. And this maps to into owned, cow owned of into owned. Oh, this should be string. Yeah, it's entirely just a convention to have these kinds of methods. There's no built in string or bool into owned. Now that doesn't fix the problem though. Wish it did, must implement DCRs for it. So wonder why, like if I do this, will it let me be happy? I have angered the borrow gods. Oh, it might be version, it might be the version trim whitespace thing. Yeah, this should implement visits docs 30. Um, docs 30. Show me the visitor trait for deserialize. There is visit borrowed string. I want that. Actually that shouldn't matter because this one is doing an owned parsing anyway. DCRs is not general enough. All right, let's see if the expansion here helps us. So four normalized manifests. Why is there even a tick A here? That's what I want to know. This is very strange because as far as I can tell, there's nothing here that would make this be unable to borrow its input. Like, I just want to see whether this makes it happy. Shouldn't. So that didn't do it. So it's, I don't think it's the custom deserialize here. I think what I want to see is whether this, there's no, there shouldn't be any reason why the additional nesting here makes a difference. Wait a second. I think I remember now why, I think there's a special interaction between cow and the borrow annotations that makes it not work when it's inside an option. I mean, this should be easy enough to test out, right? So if we did drive use sort of derive deserialize, we use Tom will edit and then we do something like derive deserialize struct through sort of borrow cow take a stir and then I do, you know, let X equals Tom will edit D. Implementation of deserialize is not general enough. It doesn't seem right. Oh fun. This is just a limitation of Toml or of the Toml deserializer. Yeah. See, that's what I thought that this is actually, I think Toml edit D from stir. Requires the T implements deserialize owned, not just deserialize. Why does it require that? Deserialize owned. Close, close of 490. So that's interesting. So Toml edit doesn't support this and doesn't plan to. So that's not what I wanted. But in theory, we should be able to do this. Why doesn't it? So the normalized manifest, we don't have deserialized owned for. That's interesting. The next question now is, why doesn't it implement deserialized owned also? Because it should be the case that our types here, like why wouldn't this also implement deserialized owned for us? Because I think, if I remember correctly, for the serdee deserialized owned, implement deserialized owned for T, where T implements deserialized for any D, which should already be the case. Like just as a sanity check, I put that the wrong way around, deserialized owned. Okay, so normalized manifest doesn't implement deserialized owned. So that implies that it doesn't meet the bound that, it doesn't implement deserialized for any, or for all rather lifetimes, which is weird because it really should. That makes me sad. Okay, so we have two options here, or we have a couple of options here. One is use an older version of TAML that does support borrowing deserialization. Another option is for the normalized manifest to give up on using borrowed strings or borrowed anythings. And a third option is to make normalized manifest generic over its string types. And the reason why we might want generic over the string types is because cargo, if you remember in, oh, over here somewhere, TAML manifest. So for TAML manifest, actually for TAML manifest, yeah, so for TAML manifest, it does use intern strings. Interestingly enough, it uses it for features and for package names and only those, which means that, let's see actually whether workspace dependency, TAML dependency, detail TAML dependency, that one does not, huh, TAML manifest. So it's the name inside of a package, but not the names of its dependencies and the names of features and the, and not features of dependencies. Or names of dependencies, weird. So, I mean, that's frustrating. I guess what we'll do then is name and features and say this is, so package is gonna have name and features. Dependencies are just gonna be strings because that's all we get. Alternatively, we could refuse to borrow, but I'm just gonna, I guess I'll just make them strings. An intern string is a technique or a type that is in cargo, which is if you create an intern string, it first does a look up into a hash map of whether it has seen that string before and gives you a pointer to the previously allocated string if it was previously allocated. Feature, feature, and I guess not feature on package. And these don't get interned, which seems weird. I'm just gonna go ahead and disagree with that and claim that those should also be interned if the name is interned. And for dependency, that's now gonna take, actually dependencies don't have, I'm still gonna claim that these are gonna take interned strings. Cows all the way down, yeah, that's right. So dependencies are gonna inherit feature. So there are no more borrows. There's no more need for into owned because there's no, because there's no borrowing in the first place. This now gives you name and feature. Feature, this is now a string, no borrow, no borrow. No borrow, this is a string, no borrow, this is a string. Note, this doesn't use borrowing DC realization because Tommel Edit doesn't support it. Package is generic over name. Just not borrow. So string, this is string, nope. And all of these are strings and no more into owned here. And string or bool is no longer generic here. It's just gonna be a string of string, no more into owned here, no more cows over there. And no more into owned here. And that means publish now needs to be generic here over name and feature, where name implements into Cal, take a stir and feature implements into Cal, stir. What's the main thing that matters? And it's probably gonna complain at me somewhere here. Right, index is gonna have the same problem where it's gonna require name and feature. And this one's kind of funny because here we could actually take, if we didn't go via the conversion to crate version, we could reuse the name and feature up here. But we're not gonna do that because we don't wanna implement the whole contents of that. Now there's more of a reason to. Like so, where is it gonna yell at me now? It's gonna yell at me add 132. That's because all of these have now changed because none of these are the same anymore. So this now needs to be into, into iter dot map dot collect, where it's now key and value needs to be K dot into. And this needs to be, I guess this is actually a vector. This is these dot into iter dot map. A clone, no collect, into, into. And we're gonna have to do the same for this conversion. Although this one is just into, into description is map, into, into documentation, homepage. License, license file, repository and links. Keywords is gonna be this and categories is gonna be this. So all of them get converted using into, this is into and this has to be converted the same way the other one does, which is this way. All right, we're getting pretty close here now. Hopefully name is into, this is into, actually those probably don't need to be converted here. They need to be converted up here because here we swap them sometimes, but not others. So this has to be into and this has to be into P, P is straight. Okay, so that's gonna be something like Cal owned now. There we go. Oh boy. Okay, dot create 29, what do we got here? Right, we're gonna require here that feature is ORD. So otherwise we can't get right over the B tree map, which then means we're gonna have to change that and publish where feature is ORD and an index where feature is ORD. Okay, 35, it returns string. What do you mean it returns string? All right, it returns string, great. So now we're back to via cargo and now this should be able to do this and we should be able to say string, nice string. Beautiful, okay, great stuff. Okay, we now have this line compiling and now at least in theory, we should be able to say the publish create version. Create version is just generic over A, so we should be able to sign any here, should be set publish create version new, pass in the M and the registry that it's for, which we're gonna go ahead and say is not that one, github.com, which I think we have somewhere here, this URL and then the index entry and let's say this is string, sember version, sember version rec, string, string, string, string, string, string, string, is index entry new and we'll just do you zero. 32, we don't care about the hash here. Ah, from publish, am I lying? From publish, ah, P, great. Expected entry, string, string, string, string, string, got a cow. Right, this already dictates all the values and so now it should be the case that we should be able to check that at every step of the way here and certainly at the end, we should be able to check that I, which is the index entry here, is equal to, and what did we call the thing? We called the thing roundtrip, I dot name and I dot version should be zero dot one dot zero because that's what cargo new generates, which means it should be sember version new zero one zero. All right, cargo test. And I guess technically what we also want to do is at each step of the way here, say here, we should be able to do, Jason is sir D Jason, did I already stick in a Jason converter here? No, sir D Jason two string of P and I should be able to hear say crates IO, new crate is sir D Jason fromster of Jason. Right, that should be the case that whatever we generate actually is, is parsable by crates IO. And it should also be the case that if we go back to Jason from P two and then we do P three is one of these, it should be the case that that's a reasonable roundtrip where P and P three are equal and this certainly should implement debug and that means dependency doesn't implement debug, which we really want it to do. Equals cannot be applied. Seems pretty reasonable to have this implement Eek and partial Eek, same for dependency. There's nothing particularly weird in here. It's all just strings. So we expect that roundtrip to work. We also expect for the index four, let's see is the cargo registry thing public registry data. Does look like it, our registry package. Yeah, but it doesn't look like it can do anything with a registry package, but I should be able to parse one. So if I do this cargo, where was it? Sources registry, I can't spell registry package is, nope, this bit, that should certainly work. And similarly, the same thing should be the case for, it doesn't implement serialize, right? Yeah, creates index hi two, hi two, and then I should be able to go back to Jason from I two, get I three, which is going to be a index entry. It's going to be that from string, and then I should be able to assert equals hi to I three. And now I think it should be able to infer all of those because I'm saying equals, at the end I should get round trip. Ooh, okay, so now we have the whole string of tests that in theory should pass, and in practice of course is not, not yet implemented as depends on file contents in dot crate, right? So this is the read me stuff, which we still haven't dealt with. And I honestly don't know what we do here. Like I think what I want to do is for now just say this is none none, and it's just not implemented. And go ahead and say to do. Okay, so something else fails. Missing field keywords at line 73. So crates.io requires that keywords is present, which we don't currently do. So in our publish keywords, we currently skip it if the vector is empty, which we're apparently not allowed to do. I'm guessing the same applies to categories. Also worth keeping in mind, you know, this is a package with no dependencies and nothing. So there's gonna be other badges. It was gonna come by to us. I want like, I want this I guess because I don't actually want to do badges. Badges. None. All right, what else we got? A valotype null, expected a map. Of course you do, vtremap new. And here, we're also gonna say 30 defaults, because it can't be an option. Why use an option over a simple empty hash map? Yeah, or in this case, an empty vtremap. What I arguably could do here is like a vector of nothings has the same structure. Well, that worked. We did get a round trip. The read me thing is definitely frustrating because without this, cargo wouldn't be able to use what we built. I mean, I guess we could just say you have to pass in the read me, right? And say that's gonna be, it's read me and read me contents. And read me and read me contents are, you know, about cows. It's an option cow to gaster. That's certainly one way to get around it, right? So now the via stuff is gonna have to change a little bit because it's gonna require that we pass this in, but we can do none none here. Publish is gonna be sad. Why? Because index 95 doesn't know what to do with it. And this is frustrating because when we come from publish, we don't care about the read me for the index, but that just means that we can pass in none none here. None none. Sounds like a bit of a song. Okay. So now we have a full round trip and we've demonstrated that it is compatible with crates.io create and the, creates index create. We haven't tested this for anything beyond a trivial archive. Like for example, you know, or a trivial crate rather, once you add dependencies to stuff, things get obviously a lot more complicated. So we would want tests here that test, you know, does the round trip work? Even if you add dependencies, would be a good thing to test. But at least this is a pretty good start. The thing that would also be nice to test here is the integration with crates.io. But because their types aren't public in a way where we can test them here, it's not trivial for us to do so. But the hope of course is that this is possible to slot in for them. And what's also interesting here I think is, you know, we could turn this into a test harness pretty easily by making this, instead of being a test, this could be a function that takes two closures, one to run here. So right after new, but before opening the workspace. And one to run here, which is additional checks to run after you've checked that the round trips all worked. And that way we could write a bunch of tests around that. And in fact, let's just do that refactoring right now. To say simplest, just gonna be roundtrip of nothing, nothing. So this is gonna be generic over a, and I guess that'll be given something like a path and a check, it's gonna be simple FN1s. I don't even know what it'll be given. I guess we can actually give it all of the types, right? So we can give it a dot crate, normalize manifest, string string. We can give a publish crate version and we can give a index entry. String, similar version, similar version rec string, string, string, although actually these are gonna be not string, but cow stir, cow stir, cow stir. And yes, we're gonna want cow because now I should be able here to run a clone, really? That's certainly something we'll want. Did I not add derived clone to these because they're gonna have to be clone. This is gonna have to be clone and same for dot crate. It's gonna have to be clone and the dependencies already clone. Aha, no more string or bool. So now this can just ignore the path and down here we're gonna do something like check MPI. Did that work? Dot crate is complaining about this. That's fine, 101. I was lied to. Cannot do type path. That's easy enough. And this takes three arguments. Oh, and I don't call setup, which I need to call here. So now it should be possible to do, you know, I don't know, something like here where path, you know, dep. One, add a dependency to p slash cargo dot comal. And then here to do, check that I contains index, index entry contains appropriate dependency specifier. But I don't think I'm gonna actually write more tests for this today because I need to eat. But that seems pretty promising. Okay, what I'll do is I will do something like can remove expanded. I'll get ignore my dot cargo and I will get add dot. Oh, right. And my patch will hopefully not be necessary for anyone else. Actually, here's what I'll do. Add dash p commit. First thing that maybe works, question mark. And I'll push that to a Git repo and publish it and stuff. But hey, we built a thing. There's obviously a bunch more work to do, right? Like we would want to try to make changes to cargo to make use of this, make changes to crates.io to make use of this, make changes to the crates.io package to make use of this, make changes to the crates index package to make use of this. There's a bunch more documentation, a bunch more testing that's needed. But at least now we have the basic types and the conversions between them all in one crate. And then hopefully, that might actually turn out to be useful and feel free to dig into this and try to add more tests and play around with it. Whether it actually gets picked up, I'm not sure. But even if it doesn't, this is a decent introduction to how that entire pipeline works and all the weird little transformations that happen along the way. Now I need water and food. But thank you all for joining. Hopefully, this was at least somewhat interesting to participate in. And I'm hoping this will give you a little bit of an exciting feel if we're going to dig a little bit into cargo, into some of these crates, maybe into this one. And I think that's where we're gonna end it off. Are there any questions at the end? It's been a very long journey. So I wouldn't blame you if you're all tired and don't wanna ask questions. But if there are any, fire away. Add fuzzing to this. You know, fuzzing for this is a great idea. One of the things that's problematic with the current state of affairs is that because these connections aren't built in a way where they're sort of standalone, it's very hard to test them. It's hard to test interoperability. It's hard to fuzz them because they're all like super ingrained into cargo, for example. So this crate should be possible to, you know, do fuss testing on, prop testing on, just general round trip testing on. So absolutely please add that. That would be amazing. All right, doesn't look like any great questions or let me rephrase. It doesn't look like there are great number of big questions or any big questions. All your questions are great. So I'm gonna end it all there. Thank you for joining. I don't know whether there'll be a part two of this, but maybe we'll see. It depends whether this ends up going somewhere. All right, bye folks.