 Thank you. Yeah. So this is going to be a talk about data with IPLD and how to build decentralized stuff. IPLD, like the lovely host said, is short for interplanetary linked data. This is a name that comes from the same folks who named IPFS, the interplanetary file system. The reason that we like the name interplanetary is right off the bat, it's sort of aiming for the stars. And stating this intention that we want to design protocols and systems that work when we really decentralize, like across space and across time, in ways that we are currently not doing, but we would like to. So IPLD is some set of specs and libraries for serialization, doing this with cononical hashing, building out of that immutable linking and making it easier to design protocols and build software. This talk is going to go through a short phase talking about the motivations and how we got here. Then the middle section is the stuff itself. This is going to be divided into three subsections. Each of these is a layer of library and tooling and specification which has its own virtues and purpose, codecs, a data model in the middle, a schema layer on top. And at the very end, like, what can we do with this? Where could we go next? How can you use stuff? So first to warm up the motivations, a little history. As already mentioned, IPLD comes from IPFS. IPFS is a project to store files and make a distributed file system that can go anywhere and go everywhere forever. Be permanent storage, be hash-addressed stuff. You can put terabytes in it and it should work. IPFS stands for files, right? But files, well, what are they? Except a very particular case of some tree structures. When we're doing stuff on disks, we have B plus trees spraying the bytes around. When we're building larger systems and like network systems, it's the same stuff. Files are just a graph. So this protocol that I'm talking about now is something that's actually evolved out of kind of the middle of the IPFS stack. Internally, it's always had some sharding algorithms. And so IPLD is something that has kind of arisen from realizing that we have these shards and making libraries to make that nicer. So another content addressable store that we all know and love is Git. In the same way that Git is just a blob store, IPFS turns out to be just an IPLD store. They both happen to have IPFS and get some porcelain layer which makes them good at storing and manipulating files and versioning things. IPLD objects are a bit like a structured format for making new objects in the same way that Git internals have the structured format for their blobs. But we're trying to take that concept and make it general and make it reusable. So that is the motivation, basically. Scratching is not an itch in making this sort of library. But once we started this realizing that none of the needs that we have in IPFS are actually rare, and we should really try to make something reusable and make something that can empower other people. So overall, this is going to be a talk about building software should be easier. But especially that building distributed applications should be easier. And that's something that we really want to narrow in on. And one of the primitives that we really need to build distributed software is linking. And linking especially by hash. This needs to be something that is first class. This is a design pattern that we can see emerging in all sorts of successful software like Git, for example. All of these block chainy things have hash linking them now. Immutable links are a powerful primitive and content addressable systems are good. I think these are basically self-evidence at this point. Another thing that is going to be a motivation throughout the design of IPLD is that format shouldn't matter. JSON is cool if you're human readable. C++ is nice if you want something that is binary packed or proto buffs. Git packs are an existing content addressable structure and it would be cool to traverse them in a standard way. Formats should be an implementation detail. And we should be able to build algorithms that can work above them. And the last thing that is going to come in at the end of the talk is protocols to grow and to be well understood by many different people need some sort of data description tools. Half of the point of this is usually documentation. Half of the point of it is actual code. But both of them are important. And schema systems have shown their value in many different systems to date. Like proto buffs are popular for a reason. But most existing protocol systems are very cathedral in a way. And that's something that we don't want. We're trying to fix decentralization problems. So we're going to need something new. So overall, the big picture here is IPLD should be a series of tools and libraries and specs that makes building the next Git, whatever good idea you have in your head for decentralize system, it should take hours, not days. Or days instead of weeks, weeks instead of months, whatever. It should go faster. So that's the dream. How do we actually do this? So this comes in three parts like I warned you earlier. The first one is the codec layer. And I'm going to try to go over the codec layer really fast because it should be really familiar. It's pretty much taking concepts that we already have and giving them a name and the library boundary and making them reusable. So codec stands for encoding, decoding, bytes go in, trees come out, or the other way around, trees go in, bytes come out. JSON is a codec, CBO is a codec, all sorts of familiar things. So what we want to ask in IPLD in our codec layer is can we standardize this and make some reusable bits here? And we want to make these things as reusable as possible to pursue this goal of canonical hashing and faster protocol development, right? I'm going to say there's good news here. Everyone basically already agrees, de facto at least, on what codecs are. Maps, lists. These are super familiar concepts. Integers and strings, we all know what those are. Booleans, bytes, tricky and JSON, but defined in CBO and some other protocols and get. So we're going to bring those along. Let's take all these concepts and form them into some sort of a standard data model. That's the next layer coming up. And let's make all of our codecs map into and out of that data model. And this lets us freely choose codecs from now on without having to worry about compatibility. But one more thing, we're here to talk about decentralized systems, right? So that other primitive that we need, there's one more. It's links. All of the interesting applications in decentralization have these content addressable links. So we're going to put that into the data model as well. A link is just an opaque concept that means something I can load later. And in IPLD, we're going to define it a little bit further to say links must be immutable. So a URL is not going to count as a link because I can't get a specific piece of data out of that later without trusting somebody else or some network service. So it doesn't work. Immutable links, though, once we add that to the existing set of map lists in string, all of these essentials, we now have a data model that can describe Git. It can describe IPFS. It can describe Ethereum and a bunch of other blockchains. We can describe a lot of protocols with this. One implementation detail that IPLD prefers that I don't want to spend too much time on because it's in the weeds, but just to mention it, we took this concept and we named it. CIDs for content identifiers is a simple protocol for how to use content addressable linking. It adds a couple of version prefix fields in order to have future proof versioning at the lowest levels. That multi-hash word means we're going to have a byte that says which hash algorithm and how long it is and then the hash. So this is cool because we're going to take some of these indicators and use them to refer to other protocols like Git, which has its own native hashing scheme, and bring it into this same model so we can have one data structure in IPLD that jumps out and refers to a Git commit and does all this sort of stuff. CIDs are their own spec. And this is the last slide about them. You can go Google VAT and find better docs. Basically, it's an opaque string or bytes. There's a multiple representation thing here. And you can unpack them to see, like, this particular string is base64 encoded. It's just a raw byte link. Shot 2. You see the point. So I've mentioned a bunch of codecs just really quickly. And some of the more things like JSON, which we can clearly see are general. And some of the other things I've been alluding to are things like Git, which are much more specific. We're actually going to make both of these accessible through IPLD. Things like JSON, which can have any arrangement of maps, like whatever nested structures you want, those are going to be called native codecs because you can take any tree of data and be like, whoa, serial now. Some things like Git are going to not quite be native because you can't put anything into it as a tree structure. But we can read some things out of it. Like, you can take a Git commit and knowing the structure that Git has internally, you can say, here's the author field. And you can turn a Git commit into a map. And then you can treat this as like a standard thing. So both of these are things that we can interact with this model. Since this is a project that's been going on for a while now, there's a whole table of these multi-codec things. It contains a lot of stuff. And I just want to mention that that exists. And now move on to the fun stuff. So the data model is what we were building up with, up to with that codec stuff. Just to make it visual, here's an example of a JSON object and all of our favorite concepts. I'm going to call these kinds throughout the rest of the talk. Map is a kind, list is a kind, int is a kind. JSON has all these things. And we can follow my crudely drawn arrows here to see them. This is another format. This is called seaborr if you haven't seen this one before. That stands for concise binary object notation. It's been standardized for quite a few years. It has all the same concepts as JSON. So this is a really interesting example of that data model based isomorphism. It still has maps. It still has lists. This looks like it's a bigger representation, by the way, because it's hex encoded and got a lot of comments in it. But if you count the bytes, this comes out quite a lot smaller than the equivalent JSON. And it's also because it's binary and it is length prefixed. It parses a lot faster. So it's kind of a neat format if you care about the speed and performance and compactness. Since we have these unified with a data model in IPLD, you can do cool stuff like treat these identically. If you're building a really large protocol, you care about the performance a lot, you care about the compactness a lot and all of these details. You could choose a format like seaborr for the internal use, for the storage and for the canonical hashing that you're going to use for links. And it's going to be fast to make you happy. But you would have lost human readability. But using IPLD and having the data model as your standard format, you don't actually have that problem. You can still freely convert back to human readable things, like the JSON example. And then you can print that out. You can use it for web APIs. Any other place where a raw binary format was moderately inconvenient, you can just be like, oh, okay, I'll switch. So the data model was this useful concept to unite the codecs. But, all right, so this is boring. This is just all warm up. All this is stuff that you could have done so far with a good serialization library. What else can we do? This data model is going to be something that we can use, not just to unite the codecs, but it's something that we can use for generic programming. And of course, it's something we're going to use as a foundation for the schemas that are coming later. So generic programming. What does that mean? Because we're defining a data format system here. We're not defining a full-loop programming language. So I'm going to back up a little bit to define the concept of genericism here. In programming, genericism means writing your code once and reusing it on different types. So you have to write it so that the types are either a parameter or they're somehow not important to what the code is doing. Genericism can come in like a wide spectrum. If you have some algorithm which has extremely literal type signatures on it, you've made something that's not generic at all because it can only work on specifically those types. Maybe that was really convenient because your compiler checks a bunch of things for you, but it means you have to copy the code now if you're going to use different types, and that's a lane. On the other end of the spectrum, you can have a type system which doesn't provide you any information at all. What we call this varies a lot between different programming language families. I like the any type the best. This is what Scala calls it, maybe a few other languages. It's called top in Haskell. It's called the empty interface in Go. It's called object in Java. It's called literally everything in JavaScript. So I'm going to call that hypergenericism when you have that any type because at that point you can write code that does absolutely anything to it. Your code can be very shared, but it's going to have lots of runtime error checking to do because the only thing that code will ever be able to do is look at the data itself and be like, what are you? And then make traces later. So you've got no compile time checks. There's another dimension in the middle where you can add mechanisms for parameterization, and this makes you get into more complex systems, and it's going to not clearly apply to IPLD, so I'm going to skip that. So what we've got in IPLD, we already have that enumeration of kinds. We're going to make every one of those referenceable by a concept called a node. So a node is going to be our hypergeneric type. It's going to be like our any. A node tree is isomorphic to a JSON document or anything else you're parsing. If your node kind is a link, remember we added links to the data model, then you've got a graph. But since it's an immutable link backed by hashes, more precisely you've got a directed acyclic graph, which is cool because you can traverse it and not fear cycles. Since node is hypergeneric, you can operate on nodes by asking them what they are at runtime, and it's basically a lot like the reflect capabilities in most strongly-type programming languages, like reflect in Java or Go. I think almost everyone uses this word to refer to the concept. Pixel counting on slides, it's just impossible. So node, in some of our IPLD libraries, I should back up and mention, by the way, IPLD being a set of specs, I'm trying to talk about it very, very generally, and I haven't shown a lot of code on the screen yet. It's because we do have different implementations in different libraries. So there's a Go one and a JavaScript one that are very far along, there's a bunch of other community maintained ones at various levels of development. So with that caution, some of the IPLD libraries, and in particular the one I'm working on in Go, has a split between a node and a node builder. These are both in the hypergenericism level of work, but they're split because nodes are immutable, and node builders let you build them mutably. These are connected by any node interface in code being able to give you a node builder, which can make a new node. And in the process, you can see how we have an interface which lets you do copy-on-write algorithms, which is kind of cool. The node interface is pretty simple. It's probably also very much what you'd expect if you've looked at the reflect capabilities in most programming languages. There's a method for returning the kind, which gets you one of those members of map list into the basics. If it's a map, you can traverse it by field or get a map iterator. If it's a list, you can traverse by index or get a list iterator. If it's one of these other basic kinds like strings and ints, you can just call a function that unboxes that into the native type in whatever language you're working in. Links the same. So why did I bother to describe this as an interface? Because if you're thinking about this from the point of view of parser trees and like turning JSON into something that you can manipulate, you're probably imagining I just need an AST. Like I can make a concrete implementation of this, right? Why am I bothering with an interface here? We're going to use this interface to do a bunch of stuff. That fully parsed tree is the most obvious concept and it's like very frequently appropriate, but it's not the only way you can go about things. Having this node interface, you could also imagine attaching this to raw binary data and having it act sort of like a generator every time you ask for the next thing in a path. It can lazily deserialize your raw data and you can see how just in time deserialization on large structures could be really useful. In languages that have like compile time native types, you can also write something that uses your native types and the native memory layout and attach more native methods to it that the rest of your program logic is going to use and also decorate it with a node interface and then have kind of the best of both worlds and a very easy way to integrate with application logic. There's one more reason that we're going to want this interface here. It's because nodes can be implemented by something called advanced layouts, which is just a thought that I'm going to ask you to hold on to for a minute. The long story short is there's lots of different ways you can implement this and you can choose an in-memory working representation that's good for you and because this is a choice that you make at the data model with the node abstraction, you can bind that choice to codecs super freely. So this is cool. And we can, as promised, do generic programming over this node interface. So traversals and graph walks, for example, are very easy to make generic. We can also make these things work across link boundaries. And that's something I want to pause and meditate on for a moment. Imagine you've got this thing that's like JSON, but you have the ability to put links in it and those links can content addressably let you look up further information. You can have a single object which, through transitive linking, could refer to terabytes of data. You could have a complete snapshot of that entire dataset by holding onto the root node. You could even have multiple indexes over that dataset coming from all sorts of different directions because this is a DAG, a directed acyclic graph of links. And because we can program things generically over this node interface, we can have things like this traversal API which, when it gets to a node, inspects it, finds a link, can go load it. You can make terabytes of data accessible through this very simple abstraction. And because all of this data is content address, you can imagine making decentralized systems out of this with just huge scope. And it's a simple interface. You can imagine taking the same thing, that interface, a little bit further and just having a callback function that gives you a replacement node. Sure. The idea is that you could do lots of generic things over this interface. You could build something like JQ here, for example, if you're familiar with that. I think it stands for JSON query. It's a really cool tool. But by building something like that on top of the IPLD data model, you could make it work equally well for JSON, Seabor or any other codecs, right? We have the Git plugin already. You could just have JQ but over your Git graph. Wouldn't that be cool? And because it works over these links, you could again make it work over arbitrarily large amounts of information. So dream about what you could do with that and then tell me about it later because it would be cool, whatever that is. One of the other things we've worked on in the core of IPLD is making a bit of a query language because why not? And this is something that bears a vague resemblance to GraphQL queries, but they're just regular IPLD objects. So you can type them out as regular JSON. This is something we want to put in the core library so you can use it for more traversals and visitor patterns. Selectors are something that you can imagine being used to pull out subgraphs and that can let us do visits to subsections of a graph and you can maybe imagine how it would be a useful building block for other applications and algorithms. For example, this is a feature that we're actually going to bring back into IPFS. IPFS is basically a giant storage bucket for IPLD objects in addition to files through that abstraction, right? So IPFS is already replicating IPLD objects over the network and lets you move these things between storage pools and that works pretty generically over this link concept. But soon we're going to take these selector concepts that let you address subgraphs and we want to bring that back into IPFS so that you can say what you want with one of these selectors and get the entire thing from a remote node as one stream. So instead of having like, oh, I want this object and I find a link and then I ask for that and then I find another link and I ask for that and you get lots of round trip times. If you had a selector that could say I want this link and then recurse over all of these and find each property named foo and then give me that subgraph, you can get the entire thing streamed with like zero round trip times. This is something we're still working on but it's going to be really neat. But wait, there's more. The last thing at the data model layer, remember I shouted out to advanced layouts a little bit ago in saying there's another reason for this node interface? Another thing that's going to come up if you try to do really large distributed systems design or just large system design in general is if I'm going to address like terabytes of data, maybe I'm going to have some maps that are just really big. If this map has like millions or billions of entries, I'm going to need some different kinds of algorithms for this. Maybe I want to shard things. So to do this, there's a feature in the IPLD libraries and specs called advanced layouts. What these do is basically let something act like one of those data model kinds. So act like a map. It's going to present itself as a node. You can use all of the same generic code to deal with it. But it could be serialized and backed by some totally other implementation of structures. So it could be a B plus tree or hampfs are a really cool data structure for distributed systems because they actually have the ability to like be sharded in a way that converges when different authors insert data. Complicated, cool. Point is you can do stuff here. You can split data up into several like different hunks, address them separately by different links, and make that still all act like one coherent thing. So this is something that we could really easily imagine wanting to use for maps and for lists, also for bytes. You can see how that would be relevant for doing large files. And basically the way I'd encourage you to think about this is the same way that Java, they have a nice collections interface, right? The same way Java has a hash map that implements map. This is sort of the same, except you can think of it as the parameters are a little bit flipped. Like map is the foundational type, and there's a parameter that says hash map is the implementation strategy. So advanced layouts are like this. They let you do generic implementation strategies for existing collection concepts. The tricky bit is you have to provide the code for this. This is a thing that has a plugin architecture. So you have to supply your own code. The library requires some wiring for this, so it's not a totally built in thing. And the idea is that this is something where people can bring their own algorithms. So some of my co-workers are really excited about these Hampt algorithms, but maybe you really like B plus trees. That's fine, and we're just going to make this a plugin thing. Serialized data can sort of try to indicate that it's going to use an advanced layout by having some magic keyword. We'll also have another mechanism for using these by schemas, but I need to introduce schemas first, so we'll come back to that. This whole idea of advanced layouts is called layouts because we started thinking about it for this sharding application. But something that I want to shout out to that came from a community contributor really recently is the wild idea of let's use these advanced layouts, these plugin systems to do encryption in band, in IPLD. What if we could have in the middle of one block of data another binary array, and if and only if I have a key for it I could use one of these advanced layout plugins to turn that into a tree and continue to path through it with all of the existing passing and traversal mechanisms. This is the wild idea that somebody presented to me like two weeks ago, and I totally didn't expect it, and I think it's going to work. So I'm now just prepared for more wild ideas like that. But with that introduction of advanced layouts I would say this is also something that you should always be cautious to use sparingly if you want to implement something like this. We know it's important for large maps and things like this and the encryption use case is going to be really, really cool. I can't wait to see where that goes. But because this is a plugin system and it requires out of band code, if you want something to work in all clients and all libraries really easily, you're going to have to figure out how to distribute that code. So while working on the IPLD stuff myself, I'm planning to make some good sharding algorithms and make those really common. But if you want to bring your own, you have to be really cautious of what this is going to do to fragmentation. There's an interpreter versioning problem here basically, but it's useful. So use cautiously. So that's it for the data model layer. All of this stuff to date has been very hyper-generic. All the codec stuff was just like, oh, let's be able to shuffle this stuff around without caring about the implementation too much. The whole data model was one of those any types. If you wanted to program against it, you have to do lots of runtime inspection. You have to do lots of error checking, lots of validation yourself. There's also lots of valuable stuff you could do with those layers. So I hope this is all worth it and good and sounds cool already. But I'm a strong type of person. I like my compilers and my tool chains to help me as much as possible. So I want schemas. Schemas are something we're going to just build on top of the data model. It's another example of just using that generic system. So that's cool. So why do we want this? What are schemas good for? What have schemas ever done for us? The hyper-generic code that we have at the data model layer is fun and it's powerful but it gets hard to reason about really fast if you write a sufficiently big system. This is the same reason we want type systems in any programming language and it applies just as well to our data systems and our data modeling even if we don't have code actively associated with it. Type systems are good. Structs are good. Classes, same thing, different language. Unions, sometimes known as some types, they're good. Let's have good things and let's have some sort of a schema system that does validation and just saves me time as a programmer. And especially remember back our motivations way in the beginning. We want IPLD to make it easier to develop things and easier to develop decentralized things and easier to do this collaboratively with decentralized collaboration. So one of the things that we're going to want from a schema system is actually just plain documentation. We want something that is terse and it must be language agnostic of course and it has to facilitate design discussion so it has to make the important things clear and hide the things that aren't important to get the discussion rolling. So if that's not a tall enough order, schema systems are not a new idea. Before anyone brings up the XKCD about 15 competing standards, yeah, I know, I know, and I've looked at I think all of them honestly. So of all the existing schema systems there are lots of things to learn from both in the positive ways that they've worked well and in some cases the ways where they have not found adoption. Existing systems are hard to apply in the context of the motivations we have for IPLD because immutable links are a really consequential thing. If you have that primitive and you want to use it well you need to give it first-class recognition and do powerful stuff with it. We also care a lot about migration. This is something that many schema systems say is important but in particular IPLD is for building distributed decentralized stuff. So it's important not just to say we care about migration but to figure out what that means and figure out what migration means for decentralized protocols where you have like really long-running people who won't abandon really old versions, right? I can't force somebody to upgrade the application if I'm a nice player in a decentralized world and I need distributed development practices to work well, which means things like strict version numbers. You can't use those for versioning of your schema if you're doing distributed development because to increment a version number you have to coordinate essentially. I don't know how many people in the room remember SVN fondly or otherwise but an older version control system that had numbers for the commit identifier. People used to like SVN because if you made a new commit the number would go up and SVN is basically gone now, right? Everyone's come to Git. We've basically admitted that Git is the way and the truth and the light and it's so much more powerful because it can work in a decentralized way but that required giving up numbers that bump linearly, it required having the hashes. So that concept that just sticks around with us for schemas. So let's get some specific ideas. In the schema system of IPLD we're going to have kinds again and we're going to have basically the same ones that we did at the data model layer like maps, still here, lists, surprise, still here, in strings. All of the same scalars are here. With maps and lists we're going to associate type parameters with them now for their keys and their values and we're also going to introduce a couple new kinds that are familiar and common and useful from programming. Structs or you can conceive that this is classes, names, unions or some types and enumerations. So these are all kinds and a type is when you assign a name to one of the above. So I'm just going to fly through a bunch of examples of what this could look like. This is a DSL for the IPLD schema system. This is something which has a deterministic transformation onto like IPLD standard tree model of course. But the DSL is terse. So type is keyword, my string is a type name, string is what kind it is and so on. This probably looks really familiar. This doesn't look wildly different than protobufs. It doesn't look wildly different than GraphQL schemas. We're all kind of evolving in some similar direction here. Type to maps probably don't look too weird. Type keyword, name, kind keyword, key type, value type. If you're using a struct that has nested fields you can use just the square braces alone. This keyword is only necessary in type declarations. This means map. You can also just nest these. So you could imagine a squirrely brace, string colon, squirrely brace, nested map. Terse, this is good. Lists are basically the same as maps. And now I've changed the slide color to get your attention because this is the first time I'm going to add a modifier. The data model basically being lifted from JSON. I haven't made too much of a point of this yet, but it has nulls in it. Often referred to as the trillion or billion dollar mistake, the big mistake of programming. In the schemas, things are not nullable by default, but you can say that they're nullable. So if you don't say nullable, a list of strings can only contain real strings if you do. Hooray. So nullable is a keyword that you can apply to map values, list values, and struct fields. Since I mentioned structs several times already, they look like this. And structs have another keyword. This is one of the first places where I think these schemas get interesting compared to most other schemas that are out there right now. Optional is a new keyword. Optional is distinct from nullable. Nullable means that the value can be null. Optional means the value can be missing. This is sort of more like JavaScript's concept of undefined. This optional keyword only applies to struct fields because in the case of a map or a list value, it wouldn't, like clearly this just doesn't make any sense. Optional fields. Now a quick word on why those keywords both exist separately and why this is important. These are related to a concept called cardinality. Cardinality basically means the count of members that a type can contain. So this table is full of examples. A type that just contains a Boolean value has a cardinality of two. It can be true or it can be false. This is pretty familiar, right? If you introduce nullable, it can be true, it can be false, or it can be null. Having the nullable property increases the cardinality of whatever it's attached to by one. Optionality is the same, and you can see how this serializes differently, but it does the same thing to the cardinality. It's plus one. It's just represented by the key of the value being missing entirely from the map. And so it follows that you can stack optional and nullable. In this case, a structure that has such a field could be true, or it could be false, or it could be null, or it could be missing the entire pair. So these plus ones, they stack. You can get a cardinality four field out of a Boole. And another feature called defaults that I'm going to introduce next doesn't mess with this, which is interesting. So this is a really important foundational design concept. Being able to count the membership of things means we can reason about the expressivity of a set of types. If the cardinality of two parts of a model are the same, then clearly we can have a lossless conversion from one to the other. And if they aren't the same, that means one of them is less expressive or more expressive than the other. This is really useful for reasoning about compatibility of schemas and the completeness of models. A lot of other systems miss this. The distinction between optional and nullable and both. If you write code that assumes only true and false, but you have a concept of some defaulting where if it's missing you assume it's one of them, it's very easy to get yourself into bug territory here, because that means you've made an implicit mapping from the thing being missing to one of the values. If you write code that does this when it parses an object, how do you serialize it back out losslessly and maintain whether or not the default is set? You probably don't. You probably have a bug in your program there. So cardinality counting is this really important concept that we've made foundational in the IPLD schemas. This comes back and helps us when we talk about defaults. Defaults are a little different in IPLD schemas than they are in a lot of other systems. If you set a default like you say this Boolean has a default of false, that means whenever the value of that field is false, it's going to be skipped in serialization entirely. So that means it doesn't change cardinality. It just lets you have, well, a default. It lets you be less verbose. The other side of the same coin is that if you encounter the default data, like if you're deserializing something that matches the schema, or that you hope matched the schema, and you find a serialized word for false, that has to be rejected as not matching the schema because otherwise we would have a lossy issue here, and this is a kind of bug that we want to prevent. Is it possible to have a default null? Yes, but it depends on what your, is it possible to have a default null for a bool? No, because then you'd need at least cardinality three. But you can have a default null for a nullable bool. Think of how many bugs we just avoided. So all these examples so far have had implicitly a representation, but this is something that I now want to make explicit so that we can explore it further. All of these kinds in the schema layer, like struct and union although that's still coming and so on, that aren't equivalent to a data model kind, they need some representation strategy for how they're going to become a data model kind. So for struct, this is such a common concept that it's obvious that the default should be representing it as a map, and so I just haven't said representation map so far. But if I do say that, well it's the default, so it does the same thing. But we can put other words here. For example we could say I want to have this struct with fields x, y, and z, and I want its representation to be tuple strategy. This will cause it to instead map into a list that just has the x and the y and the z as list items. So being able to make this kind of choice is really interesting. If you're familiar with protobufs, there's some other schema systems like that which remove self-description from being in-band. That's sort of more like this tuple representation. With IPLD schemas you can choose if you want to do this on any given type in your schema system. And this is quite powerful because you can have some things that are fully self-describing at the top levels of your data structure for example. So it's like very clear to somebody eyeballing it what's going on. And as you get deeper into the data structure maybe you're like okay I have enough context that I already know what this data means. And then you can switch to other representations and start alighting things if you want. And this is a choice you can make as an application designer so you can do what suits you. Another kind of representation that you can use for example is a for structs there's one called string join. The thing that's interesting about some of these representations is that they're changing the kind entirely. We've just seen that a struct can be serialized or mapped into the data model and then serialized I should say as a map or it can be as a list or here as a string. This is cool because there are some other common limitations that I've skimmed over earlier but think about JSON. Your maps, what can you use as a key? Pretty much just strings right like if you try to use numbers as a map key in JSON maybe you don't work depends on your language and libraries I would advise against it right. You definitely can't use other structures as keys in a map in almost any serialization system. But with IPLD schemas and this representation in direction we actually can. We can have a struct definition and give it some representation strategy that clearly unambiguously bidirectionally maps it into a string and then we can use it as a map key. And this is going to just work and do the thing that you mean it to do in any serialization system that you glue onto the bottom because down there it's just a string. So this is a cool feature. Now one of the other kinds at the beginning of the schema discussion was unions. Unions also known as some types. If these aren't familiar to you it basically means this type can contain any one of the member types but only one at a time. So if you've got a union of two different types of booleans somehow the cardinality of that would be four because you add them. If it's of three different booleans it would be six still adding. Notice that the union has a representation keyword here. There is no clear default for unions so there's actually four choices that you can make out of the standard set and they're called keyed and envelope and inline and kindid and I don't have example slides for this because it got too textual and big. But they're basically different ways of representing things that you're probably going to find in JSON APIs that you're already interacting with. Like envelope just means there's one field called type and then there's another field called message and then the rest of the content goes here and you use the type value to look things up. These strings here being the lookup discriminator. Keyed means the same thing but a slightly different order of bytes. Different things that you're likely to see. Kindid is an interesting one because it means you can make a union of two different primitive kinds like you can say this contains either a string or a list or a boole and the kind itself is the discriminator. So enums are the last additional type that's added at the schema layer. There's absolutely nothing surprising or interesting about enums it's just that we have them. So this representation stuff is interesting and cool. It allows the simple deterministic bidirectional transformations. These transformations are always something that we have to define in order to take us between the schema kind and the data model kind but we've also gained so much additional power by making them flexible that it's just like it's really cool. Most kinds do have these default representations. Unions don't because as I said no common agreement about how we should do that contemporary so we'll see how that evolves in the next decades but so before schemas we had advanced layouts and now we have these representations as well so it's worth spending a moment to compare these. Why are these different? They seem like they're doing similar things. They're changing the way that you map data into a different representation of the data model layer. But they are quite different. The advanced layouts remember were to allow pretty arbitrary, pretty turn complete code to do stuff and it can split things into more blocks and it's very powerful abstraction. Representations can't do that and they are all built to be fast. Representations are something that is a core feature of the schema layer so any library that supports schemas has to support these well-known representations and representations being fast are not just due to the fact that they have to be core implementations. It's also due to the fact that we're being very careful that whatever the representations are they have to be they have to be like very like less than O of N. They have to decide what they're doing super, super quickly. Advanced layouts like we can make no such guarantees because it lets you jump into all their code that you've attached so like nothing that can be done and representations being something that are specified in the core specs they're going to be stable over time. So on that note of advanced layouts we said earlier they could be in-band signaled with some magic keywords. Schemas provide another alternative to this. I can define a schema with some type named like MyMap. It's going to be a regular old map and then we're just going to have a generic algorithm parameter attached to the end here and you can provide additional configuration for this. This is the string that you'll use to then associate the like the plugin code with it. Maybe it has some other algorithm parameters. These are free text essentially. Having these placed in the schema is good for documentation. You can imagine if you were going to have a big schema document to hand to somebody who's implementing a matching library for some work you're doing in another language you would hand them this document and they'd be like ah I get it cool. But it has another really cool property. You can choose whether or not you want to see through an advanced layout and address its individual blocks or not by taking those traversal libraries those like hyper generic things that were built over Node and you can do a traversal with no schema in which case you're going to see all of the raw blocks of the advanced layout or you can do the same traversal but with a schema that identifies that advanced layout and then you're going to be able to like take a single path segment and like skip through however many layers of sharded map that was. Being able to choose between these two things lets you do really cool things algorithmically like remember we're coming out of IPFS here so we're thinking about files sometimes and files are just a big old tree structure so sometimes I want to treat them as that big old like range of bytes but what if I want to see through the abstraction what if I want to say grab me the left leaning tree of this whole piece of data for files this basically maps onto stream the beginning of a file if I wanted to watch movies this is a really cool thing to be able to do so by choosing or not to use an advanced layout with a sort of abstraction you can build really cool applications the last thing I want to talk about in the schema town is migrations we said that that was going to be a big important property at the very beginning right the core concept that I want to introduce here is fairly simple try stacks of schemas schemas are something that we've developed so that you should be able to take existing data that like radically predates the schema and take a schema and go home and see if they fit together we mean that to work for any single version of the schema but once you've got that property we might as well make it work for multiple versions of the schema so if I have some data and I'm not sure what version of a protocol it is I could make a stack of schemas and I could try the first one and if it fits together and it matches my data I could say okay now I know what I've got if it doesn't I can jump to the second schema try to fit it and just iteratively do this remember by the way that important point about representations have to be fast this is one of the reasons why trying to match a schema should be something that like errors or not very quickly fail fast so it makes it possible to do these stacks and it's still efficient you can just do this down the stack for quite a long ways and this gives you a really cool property in your design this is kind of like structural typing if you will it detects matching data and it does it without the use of explicit version numbers so we can use this to simultaneously have a concept of migration and versioning which is clear and unambiguous and easy to reason about but it's also really friendly to decentralized development because there are no version numbers you can have forks of a protocol which add or remove a field and then another fork which adds and removes a different field and these people might have made that development independently without communicating with each other you can represent that by just trying two different schemas whichever one matches it's what you've got and then in your application logic you can take some hyper generic algorithm that takes the node tree out of either one of those schemas and then flips it into the representation that you want the same order how do you choose a consistent order for the tri stack and make it do deterministic things in a totally decentralized disorganized chaotic environment you don't good luck humans are problem factories if so this is supposed to be friendly to decentralized less coordinated development but I want to just dial this to infinity and then like say that we're picking somewhere in the middle if you want a totally unambiguous things you have to have a gotal numbering right if we could like pull gotal numbers out of the ether itself and say these concepts are eternally unique because then I would have no versioning problems so it's kind of so it's like structural typing right if you can identify some structure and that structure provides you contextual information that makes you believe you know what to do with it then you're winning and if there is no structure that you can infer semantics from and you don't know what to do with it then you're losing like that's kind of the limit as time and energy approaches infinity so the hope is that this is going to let you probe for that kind of structure reasonably efficiently hopefully people are going to choose names of things that let you do this humans are likely to choose human readable names for things and that's a property that at some point you just lean on right so we're trying to do the absolute best we can here and make it something that is likely to be helpful to human beings yeah so we're too next to all this stuff I want to make it clear that all of the things I've talked about here are development we're doing in the open a lot of the stuff I've discussed is pretty recent and we're still firming up and ratifying lots of the details especially around that advanced layout stuff and the scheme is so if you want to try to break that concept of version migration and detection yeah come try to break it there's discussions on github I'll pop up a link for that later some of the other stuff that would be really cool to see is of course more language implementations like I mentioned earlier we've got Go and JS and a bunch of other community ones more are wanted and the scheme of stuff being really recent is something we've only got the early drafts of in Go and JS so people to try that concept out and bring it to more language implementations super desirable the advanced layouts are new like exploration required can you find a better way to shard terabytes of data than any other human before try it code generation is something that I would really like to have in the Go libraries other languages could probably benefit from similar categories of tooling I know like proto buffs have code gen in most compile time languages and it's useful it would be cool to have more of that and of course applications the whole point of this was to make a series of libraries that lets us build more decentralized applications pick a dream out of your head and come try to build it on IPLD and hopefully it goes really well and if it doesn't and you learn something that you want then you can talk to me about it so if you want to contribute or just keep in touch or try to use any of these things IPLD is an organization name on GitHub especially you might be interested in the specs repo that's where the most language agnostic things are going on and you can come argue with me and the other people developing this and the issues there IPLD is a channel on the free noob there's a snazzy website I'm a warp fork almost on the internet if you want to yell at me specifically and that's the end thank you