 Hello and welcome everyone. My name is Eric Fransen. We would like to thank you for joining us today for this webinar, a production of Dataversity with our speaker, Brian Slettin, of BOSATSU Consulting. Today, Brian will be discussing JSON-LD. Just a few quick points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. We will be collecting questions in the Q&A box in the bottom right-hand corner of your screen, and we encourage you to enter those whenever they come to mind. We will be getting to the answers for those questions at the end of the hour. At some points during today's presentation, the layout of your screen may change depending on the system that you're using so that the entire screen gets filled up with the presentation slides and you won't see those panels on the right. If that happens, don't panic. Please be aware that a drop-down navigation panel will appear at the top middle of your screen, and you will still be able to access the Q&A and those other modules using that panel. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information that may come up during the webinar. And now a few words about our speaker. Brian Sletton is a liberal arts-educated software engineer with a focus on forward-leaning technologies. His experience has spanned many industries including retail, banking, online games, defense, finance, hospitality, and healthcare. He has a BS in computer science from the College of William & Mary and lives in Auburn, California. He focuses on web architecture, resource-oriented computing, social networking, the semantic web, data science, 3D graphics, visualization, scalable systems, security consulting, and other technologies of the late 20th and early 21st centuries. He wrote resource-oriented architecture patterns for webs of data for the Morgan & Claypool synthesis lecture series on the semantic web. He is also the author of two videos for O'Reilly Media on hypermedia and linking data. He is also a rabid reader, devoted foodie, and has excellent taste in music. If pressed, he might even tell you about his international pop recording career. And I can vouch for that. I had seen evidence of said career. So welcome, Brian Sletton. Thank you very much, and thank you for everybody who's joining us today. This is an exciting time for this technology. We've been discussing semantic web technologies for 15, 16 years now. I've been building semantic web-based systems for about 11 or 12 years. And there's always been a certain amount of pushback from developers, and we're finally able to respond with a set of standards that should be appealing to those developers, as well as the people who are more interested in the semantic web side of things. To understand why JSON-LD is so important, though, we're going to first have a very quick reintroduction to the LD part of the linked data part of it, and then we'll go into what JSON-LD brings to the table. So we'll have a quick introduction. We'll talk about the basic JSON-LD syntax. We'll look at some mechanical transformations that will happen around the syntax to be able to put it into predictable formats. We'll have a quick look at some of the development tools, and perhaps most excitingly, some of the real-world uses to improve search results, to share information in a machine-processable standards-based way. I will be going through the material fairly quickly to save time for questions at the end, but my contact information will be at the end if you have any additional questions or want to talk further, please feel free to contact me. So the RDF data model was designed to solve some specific problems, and when you just compare it against something like a regular JSON document, it seems very much more complicated, but it's because it's actually doing something a lot harder than simply serializing data using a easy-to-parse format, which is roughly what JSON brings to the table. Now, that's a very useful thing for it to bring to the table, but it doesn't solve a lot of problems beyond that. So in the RDF world, we have a standard data model that is a graph because a graph is an extensible format. We have URIs to identify our entities, shown here in the graph format as ovals, and we have also URIs to represent the links between entities and values, and those values could be other entities or literals. So here we see three statements about a person named Brian Sutton. He's got a birthday, a name, and is a member of the class of person. So by using good, stable identifiers, we can say things about anything. We can apply these to data, to documents, to services, and to concepts. I exist in the real world, but I don't serialize particularly well, so you can't ask an information resource system for me or a representation of me. However, this identifier that I use for myself, grounded at a site called w3id.org, has allowed me to put a valid identifier there, and it resolves to a document that then describes me. And so we're able to say anything about anything, which allows us to disagree, to have different opinions on things. But the nice thing about this is that this graph can be serialized into standard formats. So we use standard identifiers. We use a standard data model. We use standard serialization formats to be able to make this information transferable between one system and another. So in this case, I see an example of the same graph stored as a turtle file. Turtle stands for the terse RDF triple language. And you can see the shape of the graph expressed in the turtle. It's almost a purely essence-oriented serialization format. So we have the identifier for me. I'm an instance of the person class, and I have a name and a birthday. And what's interesting is the notion of being a member of a class and having some attributes around you are both very different types of relationships, but the data model uses the same mechanism. So here we see those facts stored as what we would call three triples. Now the cool thing about this is if a remote system allows us to ask for more information, in this case from another turtle document published on the web, here we see one more fact serialized about me. I have a picture on the web that describes me. And if you've never seen this term before or if you don't know who I am, in addition to being able to resolve my identifier to find out more information about me, you can also resolve the depiction identifier to find out more information about what the term means. So these are like global identifiers for database keys or global identifiers for database columns, but they operate at a global scope because they're grounded in namespaces. So we end up having very portable data expressed in standards. That means clients don't have to know anything special in order to parse it. This means that you can ingest information from people that you've never talked to before. You may not know what the term means, you may use different terms yourself locally, but you can still ingest information from something that you've never seen before. So with those triples in the local turtle file and this extra fact available for me on the web, I can now use a standard tool, in this case RDF-CAT, which is a command line tool that comes as part of the Apache Gena project, allows me to say grab the information in this local file, grab the information in this remote file, merge them together locally into a model and spit it back out as turtle. And so now we see I've accumulated an extra fact about myself that I have a depiction because I use the same identifier. Now I'm not going to go into how we solve these problems, but this may seem like we all have to agree on the same names for everything. It's nice when we do, but it's not required, and we have other ways of handling that. So effectively, if your information, no matter how it's stored, it could be stored in a relational database, it could be stored in a document, it could be stored natively as RDF or content negotiated behind a REST API, if it's available in this format, then the cost of integration falls to a very, very low effort to be able to connect information across these datasets. But what we've also done is we've learned a new fact. We've extended the graph that we know by adding this new information in there. And this idea scales very well. We have at linkeddata.org a website dedicated to the principles of linking information across these datasets, as well as information about the nearly 600 datasets that a bunch of volunteers have connected representing billions of facts. In the green section of the diagram over here on the right, we have got academic research, so who cites whom. In the pink, we have information about life sciences at the drug level, at the side effect level, at the manufacturer level, at the clinical trial level, at the genomic proteomic level. And because the information accumulates so easily between these datasets, once you start to be able to connect them, you can ask questions like find me, manufacturers in Indiana who have a revenue above a certain level and have been attached to studies for non-steroidal anti-inflammatory drugs with no side effects. So those kinds of questions become very natural, meaningful ways of exploring the information that's been made available to you because it is described in a semantically rich way. Now, this is fantastic. This is very exciting. A lot of people are using things when Jeopardy played Ken Jennings, or I'm sorry, when Watson played Ken Jennings on Jeopardy, in addition to all the rich analytics that the Watson tool has, it reached into this linked data cloud to pull out references to books and movies and authors and things like that. So this is a very exciting vision of how this idea scales, but over the past 15, 16 years, when presented to developers, their reaction has been to think about it and then reject it. Now, some amount of this is intransigence. Some amount of it is ignorance. They just are not familiar with it. They're not thinking large term like this. But there are some very real tangible critiques of the formats that have been produced by the semantic web standards. Number one, the tools that they're mostly using, where all the innovation is happening in terms of the JavaScript libraries for jQuery and Angular and things that run in the browser, haven't been able to ingest this information as easily as they could have. So they don't necessarily know how to consume it. If they're using relational databases or something like Mongo, they don't really have a place to put it. If you inform them about triple stores like StarDog and Virtuoso and these sorts of things that all handle this data very, very well, they don't want to set up another dataset necessarily because they're not necessarily drawn to the linked data aspects of this. If they are, then certainly these other tools may make all the sense in the world. So the nice thing about JSON-LD is it's an attempt to solve problems for both communities. We want people who want to think in terms of the linked data side of things as well as the developers who may or may not want to. And we're going to give them that kind of freedom. So the goals of the project are to be 100% compatible with JSON. The idea is I don't want to be mostly compatible and still have some little differences because somewhere, some library will fail and that will be enough for somebody to reject it. We want this to be very easy for people to use. Where possible, we want to be able to convert regular JSON into JSON-LD without having to change the JSON. Some people will allow us to make the changes, but some people will also refuse to, right? The existing tools cannot be changed and so we want to be able to support that as well. That is an interesting problem to solve because JSON has no idea, no notion of identity or being able to make remote references or being able to have types and literals and internationalization, all of which are required for sort of integrating information at scale. But the larger goal is we want to be able to consume linked data and to produce linked data within the web development that we're doing as we're building apps, as we're building websites, as we're sharing information with the web architecture. We want to make it easy to produce and consume linked data as well. For those developers who have no interest in RDF, don't want to know fingers in the ear, that's not a problem because they can treat the JSON-LD as just regular JSON, but those with eyes to see and ears to hear will see how to contextualize the JSON terms into a global context to make the data as easy to use as native RDF. It's very easy to turn JSON-LD back into RDF and the other way around. For developers who are building on top of things like the mean stack, Mongo, Express, Angular, and Node, it's easy to store JSON-LD documents in those JSON database engines. You're not going to get the reasoning capabilities or the ability to run sparkle queries or things like that, but if it's just a question of, I have this information and I want to store it in tools that I already have, this is now an easy thing to do. So JSON, if you're not familiar with, is a very simple serialization. It was a reaction to some of the perceived complexities of XML. Again, XML is trying to solve much larger problems than JSON is trying to solve, but a lot of developers felt like XML and all of its namespaces and things like that were overkill for the problem of sharing information from, say, a backend server to a front-end user interface. And so JSON-LD was a very simple JavaScript-friendly mechanism for having these key value pairs in documents with the ability to make embedded object references to other JSON objects. So you might see something like this, where you have simple terms capturing information and if you understand what I mean by name and if you understand what I mean by birthday, then you can write code to consume this. We have a shared understanding of what these concepts mean. And these are both universal terms, right? Everyone has a name and a birthday of some sort, but you may call them different. You may call it full-name. You may call it birthday to date of birth, but it doesn't take too much to get into a problem where the terms that you have will collide between various domains or systems. So what JSON-LD allows us to do is to simply add a context. Now, this funny-looking keyword, at-context, is an attempt to be backward-compatible. Certainly, there could be existing JSON documents that have the word context as a key, and we don't want to collide with them, but at-context is unlikely to collide with anything. And whoever it collides with is probably kind of weird and marginalized anyway. So anyway, the JSON-LD community has adopted this notion of introducing new JSON-LD-specific keywords using the at prefix. And so this is a useful mechanism. I'm linking to a document that will then describe how to map these terms into a global context. And if you don't care about the context, you could ignore it, but if you do care about the context, as we'll see in just a second, it will help you figure out how to globalize the references to make the data more easy to share in a linked data way. But what's really going on is like when a couple of friends get together and have a meal, right, they say, hey, you know, I saw Bob and Mary the other day, they just got back from visiting Mark. They don't have to fully define everything. They can use the short terms Bob and Mary and Mark because they have a shared understanding. They've known Bob for 35 years. They know Bob and Mary have been married for 20 years. They know that their son, Mark, you know, is in college. And so they're able to use these sort of short references because they have a shared context. And that's fundamentally what JSON requires of you. Any code that consumes information that you're producing using regular JSON has to have a shared context. And it's not that that's not possible. It's that you have to document it. You have to publish it. Somebody has to read that and then encode that understanding into their software so that when you do serialize the JSON and its simplicity, you have to do so within a shared context. Just like we are sort of socially and biologically limited to the number of people that we can have intimate shared contexts with, there's a study that says you can generally have no more than like 150 friends in your life. The same is true of software. You can't have a shared understanding with every single system out there. You would not achieve the scale of interoperability that we see with the linked data project if everybody has to explain and share that understanding by encoding it on the client. So by having the document become self-describing and the linkage to the context be defined, then you can accept information that you're not necessarily, you don't have a shared context for, but you can contextualize it. So if we actually try to resolve that document, what we get back is another JSON-LD document. We see we're using regular HTTP. We get back a content type that informs us that it is an application LD plus JSON format. And in that, we have an embedded context that informs us that the term birthday should map to fof birthday. Fof stands for friend of a friend. It's a collection of terms that I used in the RDF example. And name maps to fof name. Now you're not required to use the fof terms. You could use schema.org. You can make up your own terms. The point is simply this document establishes a map that tells you how to turn the words into globally identifiable, therefore easily disambigable and interoperable references to the relationships. And those are also generally resolvable, so you can find out more information about the terms in machine-processable ways as well. Now in general, it's a useful idea to have a shared context that's available through a URL. That way you can update the context without updating the documents. It's shared across various instances of the documents. You don't have to have duplicated mappings all over the place. But for the purposes of this talk and for cases where you need to have self-contained documents, you are also able to directly embed the context in the JSON-LD document itself. So here we see the same information, not accessed through a URL, but embedded within the document itself. So now we have a self-contained JSON-LD document, the information in the context object. It's kind of like schema information that you would have about a relational database or something. And then what's outside of the context is generally instance information. It's about the thing that we are talking about. The cool thing here though is because JSON-LD is a standard and because it's able to express these directed graphs, I can go back and reuse one of my standard tools like RDF-CAT, which understands JSON-LD and understands how to convert it to and from RDF, I'm now able to say give me a turtle serialization of the information in this document. Now it may surprise you that what we get back is a blank node. This is a concept in RDF for things that have attributes but no apparent identity. Because JSON has no notion of identity, I have no way of associating identity with this object. It doesn't know who I am. It doesn't know that I have an identifier registered at w3id.org. So it just simply says there is something that has a birthday and a name. So we need to solve the problem of identity, but we also have other tools using semantic web technologies to intuit identity. If I happen to have an e-mail address here, e-mail addresses generally are associated with a single identity, so a reference to the same e-mail address and another dataset would infer what would allow a reasoning engine to infer that this is the same person being mentioned here. But we're going to solve it more mechanically using capabilities from JSON-LD. The first thing I can do is I can introduce a new attribute and use the JSON-LD context of atid to introduce an identifier. I'm saying to a JSON-LD processor, when you turn this into an object, here's the identity you should use for it. In this case, I can provide my own identifier that we've spoken about already. Now this is a change to the JSON. I will show you how to solve this problem without doing this in just a minute. But now, if I convert the JSON-LD into RDF and turtle, it starts to look more like the RDF that we had before. So now the node that's being described outside of the context in the JSON-LD document has identity and I can assert facts about that thing. Now, if I do still have my remote turtle that has the depiction information, notice it's coming back as turtle. It has a mime type of text turtle, and so this is expressed explicitly in RDF. But because JSON-LD is trivially turned from RDF into turtle and back again, it becomes just as easy to do the same kind of data integration that we did with straight native semantic web formats like turtle and RDF and triples. So now I can say to the RDF cap tool, go out and fetch this information in JSON-LD. It will be given a mime type. It will know how to parse that, turn it into the RDF model, go out and fetch the turtle. That will come back as a turtle mime type. It will know how to parse that, put it into the model because we use the same identifier in both datasets, the data accumulates, and we have the same capabilities that we did when we were just using straight semantic web technologies. So this is one of the value propositions that JSON-LD brings to the table, is that it allows us to use developer-friendly serializations of directed graphs in ways that hide the graph nature if we want to. All right, the fact that this works doesn't necessarily come from the fact that I use the same terms that the fof vocabulary used. I used name and birthday so far, but locally I have the freedom to call them whatever I want. I could call them full name and DOB, and as long as the context points back to the same identifiers, then this will be turned into the same graph format when we want to. Now, the first step into my promise that we can do this without modifying the JSON is to engage what's called a link header. In RFC 5988 we introduce a response header on an HTTP request that allows us to identify a link in which more information can be found. So for example, if you return a PDF document, but the client doesn't know how to identify or doesn't know how to parse the PDF, you could have a link with a real type of alternate and a type of plain text, and point with an href to a text version of the document. The client can discover that without having to know how to parse the PDF in the first place. So that idea is being used here. We have an existing JSON reference to a JSON document. We see in the content type of what's coming back that it's just a plain JSON. And we see in the body of what comes back that it's just plain JSON. However, there is that link with a real type of JSON-LD context and a type of application LD plus JSON that would allow any client that understands this stuff to be able to fetch the context and then turn this into linked data as well. This doesn't solve the identity problem, but it's the first step towards being able to take existing JSON APIs, return a link response header, and the JSON APIs themselves do not necessarily have to change in order to make them link data capable. So some of the features that we have, we want to be able to use internationalized resource identifiers for JSON objects, something that they don't currently have. We want to be able to disambiguate keys between various JSON documents. Again, it's easy for universal concepts like name and birthday, but once you start getting into things like title, we're not sure whether you're talking about publication metadata as in the name of a published resource or a rank in English royalty. So we want to be able to disambiguate across these various domains. We want to be able to make references to remote JSON objects, and we want to be able to support internationalization of labels and things that are presented to users in different contexts. We want to be able to put types back on our literals, so rather than just having strings, we can say this is an integer, it is a date, it is a positive number, various things like that. And ultimately, for the linked data crowd, we also want to be able to express directed graphs. Here is a collection of nodes, here are the relationships between them, and here are the identifiers to give to both the nodes and the relationships in order to be able to share linked data. So the terms are the short names. These are the equivalent of Bob and Mary and Mark. These are the names that everyone's been using in JSON the entire time. Generally, these are short, easy to remember, easy to parse identifiers. We look them up with the key references but we want to be able to turn those into full IRIs or, as we'll see in a later example, a blank node identifier. Sometimes we need the ability to identify a blank node as distinct from another blank node within the same graph. There are no real naming restrictions other than don't use the app, that character has been used to identify JSON LD terms and don't use any existing keywords. So we have two main types of JSON objects in a JSON LD document. We have a node object, which is just a regular JSON object that has some properties associated with it and it's not part of the context. So if you remember the implicit node describing me had a name and a birthday, that was the root node of the graph. There were no other nodes in the graph so that was the only thing that was being discussed. But we'll see some examples in just a minute that will allow us to have embedded references as well. And then you can't contain any of the value type keywords that we'll see in just a second as well, at value, list, set, or terms you can't use in node objects directly. The value objects are also regular JSON objects, but they are contextualized to help us associate types with the strings that show up in regular JSON. I want you to parse this as a date. I want you to consider this the Japanese label for whatever is being discussed. All right. We had a type instance of relationship that I expressed already. We have a new JSON-LD term called at type that allows us to express the same idea. And in a JSON-LD document, you can make fully qualified references. So in this case, Brian is a member of the folk person class. I can indicate that with the at type. You'll notice I've removed the at ID term because I want to build towards being able to express regular JSON without having to modify it. I'm just introducing a few quick terms here, and then we'll start to extract the JSON-LD specific stuff from the JSON. So we're back to a blank node being generated because there's no at ID, but we can indicate that this thing is a member of the person class. We don't have to use fully qualified identifiers, and you can be multiple things. In this case, I can say Brian is a person and a fish fan using just simple terms in a JSON array. And in the context, I can uniquely identify that a person maps to a folk person and a fish fan maps to the non-existent vocabulary. That includes a term about fans of the banned fish. This will then generate a comma-separated list of types that something is in the blank node. So you can be as many things as you want to be. We don't run into the problems of multiple inheritance like we do in many object-oriented languages. If you find you're going to be using a lot of terms from one primary vocabulary in this case, almost everything that we've done so far has involved the folk vocabulary, you can simplify the expression to use in the context the at vocab keyword to indicate that this is where you should look, JSON-LD processor, to find most of the things that you're talking about. So in this case, when I say that something is an at type person, that will be automatically mapped into the fof namespace. When I use the term name, that will be mapped into the fof namespace. And so this is a simplified way of having a predominantly fof-driven set of terms. You are absolutely not limited to that. You just have to use prefixes in ways that we'll see in just a second to mix in multiple other terms and vocabularies. Now we have this other problem that we need to resolve, which is if I make a reference to another URL, say using the fof home page relationship to indicate that this person has a home page, because JSON only has strings, this will be interpreted as a string by the JSON-LD processor. Now that's not necessarily the end of the world, but because link data we want to be able to link to other things, I may want to say statements about the webpage. What is its license? When was it produced? What's the subject of the document? So I don't actually want that thing to be parsed into a string. I want it to be parsed into an identifier of another node, because JSON has no notion of this. We're going to now use a value object attached to the home page term in the context, and we're going to tell the JSON-LD processor when you parse something and it's pointed to through the home page relationship, that home page relationship will be resolved against the fof namespace, but the thing that it points to should be parsed as a node identifier. It's not a string. So now nothing else changes in the data. This value object in the context gives us the additional mapping to allow us to parse that as a node object reference, at which point any facts about the both.2.net webpage will accumulate about that. So I could say things like using a sparkle query, find me the license associated with any home page associated with the person named Brian Slutton, and because the graph is able to accumulate, that kind of query will produce interesting results. All right, so we're going to mix a couple of ideas here just because going through individual language syntax features would get boring and old really quickly, and there's a bunch of more interesting things to talk about. So here I am no longer using an app vocab. I'm mixing in three vocabularies, the fof vocabulary, which has its namespace mapping, the XSD, XML schema vocabulary, which has its mapping, and a fake one called EX that's pointing to an example namespace that doesn't actually exist. But now, when I make references to information about my dog, in this case, at type, he is an instance of the dog class from the example namespace. The JSON-LD processor will see the colon in the string. This is what's called a compact URI, and then it will say, do I have a prefix associated with what is on the left of the semicolon, and in the context, in fact, it does, and so it will then build up the more complex string. So we see that Loki is an instance of the dog. He's got a name and a birthday. Notice birthday maps to fof birthday. So again, the JSON-LD processor will find the colon, look up for the prefix, see the prefix mapping above it, and be able to turn that into a global identifier. For the age term, we see another value object here that says the identity of the age relationship is fof age, which will also kick in the fof expansion, and the at type is XSD integer, which will trigger the XSD expansion, but that then allows the processor to convert a age reference as a string into a numeric when we actually turn it into a data model that has type information associated with literals. So I just threw a lot out there, but it gives you a sense of how you can have very specific rules and guidance in converting the strings and the data structures from multiple vocabularies into more expressive data models than just string and key value pairs. All right, so we still need to solve the problem of identity. If you put an at ID term and then have a non-url identifier, then that will be considered by the JSON-LD processor to be a relative identifier, and it will be resolved relative to the document. So as the document is moved around from one directory to another or one website to another, the identity of the object that's being identified will change, which is clearly not what we want to do. So one of the things that we can do is to introduce an at base keyword term in the context, which will then say any relative identifier as mentioned in this document should be resolved relative to this base. And so in this case, the HTTP-Bosatsu net people namespace will become the basis, and the identity of the object that's being returned will be kind of like mounted into that namespace. So now simple terms can be put into fully qualified namespace references, even if they themselves are not URLs. Now, I'm still requiring you to put an at ID into the instance node object, so we need to solve that problem. So now if we have a URL term that you're using in your existing JSON, and it happens to point to a fully qualified URL, a lot of JSON has URL references in them, and you want to say, hey, I can use this as a reference to the object. The thing that's being discussed has some kind of identity. In the context, I can use a JSON-LD concept called aliasing to alias the URL term to the at ID term. So a JSON-LD processor will now treat that as the identifier as we've seen before, and that becomes the ID of the object that's being produced. So that's cool if you're using URLs, but most people using JSON are not identifying their content with fully qualified URLs. Instead, they're using something like employee IDs or database IDs or something like that. But if we combine the idea of relative identifiers and aliasing, then we have the ability to take an existing JSON document that has no notion of identity. We can alias employee ID to at ID, which will turn 12345 into a relative identifier, have an at base in the context. The relative identifier will now be resolved against that, and now we have the ability to take existing JSON and turn it into linked graph JSON. And if you notice, I have not touched a single thing in the JSON instance part of the document. Everything has been done purely in the context. With that in mind, I hope you can now extrapolate and figure out how we can take a contextual reference and points to a document that describes the context or just remove the context entirely and use one of the link response headers, and now you have enough to go and modify every single one of your JSON APIs to be linked data aware. You need a context that's visible in the response header, and anybody who finds it in the response can now turn your data into JSON LD, which then allows it to turn it into RDF, which then makes it linked data aware. Now, this is... there's a lot more to the language itself, but I want to focus now on some quick mechanical transformations. What I have shown you so far in terms of something being called a JSON LD document is really a compacted JSON LD document. It's a special form of JSON LD where the name shortening has been applied to the terms. But I can take the JSON LD document and remove the context by applying the reverse and produce what's called the expanded form. So in this case, I have an array of statements about the thing. So in this case, there's a blank node with a name of Manu Sporni and a home page, and I have removed the context from the JSON document. So this is the expanded form of JSON LD. And this is interesting because I can now apply a different context. Someone else uses other localized terms. Rather than using name and home page, you know, you could have norm and page, and when I apply that context to re-compact the expanded document into a new compacted form, the terms will be generated into my terms, the ones that I want to see locally. And now you start to see how a JSON LD document, whether it's got JSON built directly into it or not, because of the response headers we can use, I can take existing JSON, contextualize it, expand it, and re-compact it, and the terms that are consistent between the various contexts will now propagate. And this is a mechanical transformation of arbitrary JSON talking about arbitrary things. Anything that's not part of the context or is not shared between the two will be thrown away. And that's okay. That's generally, I mean, if I don't know what your keywords mean, I don't know what to do with them. I'm just going to ignore them. The JSON LD processor will drop them. But this is also an interesting way to think about if you have information and you want to have an internal context for internal systems and an external context for sort of public-facing systems, you could only choose the terms that you want to be part of the public data and expand the localized one into the public one, re-compact it, expand it, and then re-compact it with the public context. And anything that's not part of the public context will automatically be filtered out. So you can think about how these mechanical transformations will help you reshape your data in different forms. So now you can imagine a server that accepts a JSON LD post, and that post can have information associated with it defined with a particular context using a particular set of terms. If they map to yours, you can just use them. If, however, they are somebody else's context, you don't even really need to know what that context is. As long as there's some shared linkage between their data and your data through the contextual mapping, you take their compacted form, you expand it, you re-compact it with your form, and now you're able to ingest information. It drives the effort of integration across various data sources down to very, very, very simple mappings that can be captured, reused, and extended when moving forward. There's another form. Because graphs can be serialized in lots of different ways. Here's an example where I'm still the outer root node of the document outside of the context, the thing that's being discussed as me. I have identity, I have a name, and I know in this case three people, two of whom have no identifiers. Malik has one identifier. So now I have a series of embedded object references through the nodes relationship, and that's not necessarily going to be predictable to structure the graph. I don't want to have to write code upstream that says, is Brian the outside node? Is Sean the outside node? Is Jeff the outside node? So I want to be able to project this in a flattened format, and that's what the flattened mechanism is. So I can take the graph and now serialize it in the flattened form where we just have a directed graph. Each node is given identity. The nodes that have known identity are given blank node identifiers, and the ones that do have identity are tagged as such. And now we see, for the node that identifies me, I fof know the three other nodes, and that linkage is now done within the flattened format. It's still the same graph. It's just been mechanically transformed into a more predictable mechanism. There's another format we're not going to go into called framed format where you can decide what structure you want to project moving forward. But these mechanical transformations of arbitrary documents representing arbitrary information contextualized through these mappings is a very, very powerful extension of what JSON-LD does for us. Very quickly, there's a JSON-LD playground. We can go and play around with things. I encourage you to check it out at jason-LD.org. You can type some JSON in. You can look at some existing examples representing people, events, products, recipes, libraries, things like that. Developers have the ability to interact with JavaScript, Python, Ruby, Java, C sharp, PHP. There are new formats or new libraries appearing all the time. But because this is just regular JSON, you can use all your existing JSON tools and the JSON APIs really only manage things like compaction. For example, here in JavaScript, you could require the JSON-LD library. You can have a document with the expanded formats or the expanded names. You can have another object, JavaScript object representing the context. And then the function of the API is simply to take a reference to the document and contextualize it, to flatten it, to expand it, to do these kinds of transformations that we discussed. That's really the only thing that the JSON-LD mechanism is, the API is going to require you to do. Otherwise, you're free to use your existing APIs. They're just idiomatic language-specific ways of interacting with these standards. The JavaScript, JSON-LD implementation, and the Java one will look like idiomatic Java. You're creating streams. You're creating maps. You're using static methods on these things to compact it. So learning these APIs for developers is very easy to do. They're able to emit JSON. The JSON can be consumed in jQuery or Angular or these other frameworks. But as needed, it can be turned into linked data. Now, all of that's cool, but it's made exceptionally cool by some of the real-world uses that are out there. Now, one example is, one question I get asked all the time is, well, who's actually using this? And the short answer is it doesn't matter. I've already shown you enough value to getting out of this even within your organization. If you use somebody else's API and they return just JSON, then you can define a context to turn that JSON into JSON-LD. So in this case, you're using one of Google's Geocoder libraries, turning it into an RFC 6570 URI template. You just need to plug in a couple of values and then you can treat this like any other URL in the world. Issue a get request to it. You get back an array of JSON objects. Turn those JSON objects into a graph and then you can run a Sparkle query against the results. So on the one hand, it doesn't matter who's using it. You can still use it and get looking at embedding JSON-LD in documents like HTML documents. In this case, a script with a type of application, LD plus JSON, allows you to define a specific domain. But we're using standard data models, standard serialization formats, standard identifier references to talk about arbitrary things. That is part of what Google is encouraging people to do when you send email to Gmail accounts. In this case, if you say, I'm offering you $5,000 off of a car, but you have to respond within a week, Google can identify those terms, capture the response, capture the timeframe, add reminders on your calendar because the cost of consuming this information is relatively low because of the standards being used. So we can not only identify these things, but we can also capture the action that I want the client to take. In this case, collect a status, the required property is the RSVP status, and then post the result back to this endpoint that I'm telling you about. You don't have to read my API documentation. You don't have to know how to build this URL up. I will tell you. And then when you capture the response and submit it back, this can be done mechanically by arbitrary clients. We've got sites like GitHub advertising the fact that an email contains references to pull requests or issues. So these things can be uniquely identified. I've heard variously that they're using embedded JSON LD or HTML5 microdata. It doesn't matter. Those are both standards for expressing machine-processable metadata about an arbitrary domain. Now Google can tag your inbox to draw your attention to things that you need to react to. Google is encouraging musicians to mark up their web pages to indicate when and where they're going to be performing. So in this case, a fish concert at Dick's Sporting Good Park in Commerce City and here's where you buy tickets. It's now able to parse that information and extend its knowledge graph so that when people search for information, they find that information attached to the knowledge graph. Now Google gets its information from a lot of different ways. We don't know for a fact that fish is using JSON LD behind the scenes, but that's an example of the kind of thing, whether you're a large successful band or a folk singer, just mark your web page up and Google will learn things about you. And if you think this is going to stop with email offers and musicians, I think you're not paying attention because this is how we can share information at scale with low effort. Finally, we're seeing JSON LD show up with the Hydra project, so an RDF vocabulary for describing hyper-media APIs, serializing the results with JSON LD so clients can dynamically discover what's available from an API and present that to a user without having to have a lot of hard-coded, shared understanding beforehand. We're seeing initiatives like the Open Credentials Work, which is a community group trying to create machine-processable credentials for sharing information as well. So there's lots of different ways I've got other clients that are redoing their APIs around JSON LD formats just for nice, clean hyper-media kinds of interactions. So I want to give some time for credentials, but Eric, do you want to take over, then? Sure. Hello, and thank you so much, Brian. We do have a lot of questions so far, but we encourage more if people have them, but we encourage more if people have them, please do type them into that Q&A section. Brian, let's start with a softball question for you. What is the meaning of the Chinese ideogram in the top left of the first slide? It's just the first character of my company. Both of them so long. Both of them consulting on expressed in Japanese. Great. So the next question. Early on in the presentation, you said one of the goals of JSON LD is to be 100% compatible with JSON. What does 100% compatible with JSON mean? Does it mean it's 100% JSON or something less? Well, JSON is a standard and that standard is expressed and libraries and tools and things like that are written against that standard object references and things like that. So the point is simply, nothing that we're introducing should violate any of those expectations so that all existing standard compliant JSON tools will continue to work with JSON LD. So if they had chosen a slightly different route, they could have introduced some new syntax or something like that, which the existing tools wouldn't know how to understand. The context term in slide 19. And the question here is, how is that context different from XML's namespaces, which you said earlier was the primary reason people wanted to use JSON instead? Well, there have been a lot of efforts now that we see this recurring pattern in technology all the time. Something becomes too big and complicated and then we have a simplified version. We saw this with Java, server-side development and Rails was a reaction to that. But then a lot of the things that Rails did not initially support like transactions and what not get shoved back in and therefore we then start to see node as a reaction to the complexity of Rails. Same thing happens here. Hard problems. Hard problems require special handling. A lot of developers said, oh, JSON is XML too hard, let's come up with a simple mechanism. But once you do that, then you start realizing, well, I don't have any way of transforming this. I don't have any way of validating this. I don't have any way of these sorts of things. And so we introduce schema validators. We introduce these other sorts of things and JSON starts to become more complicated again. All we're doing with JSON-LD is getting some of those ideas making them optional for you to care about. The XML namespaces in an XML document are not optional. You have to care about them and it complicates how you parse and deal with this. JSON-LD allows that to be an optional mechanism, but for those who do care about the value of namespaces in these terms, they have a way of managing it. So I would say the difference is it gives us some of the benefits of using XML namespaces without requiring everyone to have to pay the price. This question we get a lot. Will the deck be shared? Yes, it will be. We will post the deck and the recording of this session simultaneously within two business days. So by end of day Monday, those will be up at dataversed.net. What use case in a closed organization for these language data translation acrobatics is the question. The question is if I'm not interacting with the rest of the world, the assumption is within my own organization I have consensus and I would say that is not even remotely the case in most medium to large organizations. We even see in design and development initiatives like data-driven development the idea of a bounded context. If I require consensus to hand off from point A to point B, I can minimize the amount of complexity, I can minimize the number of people I need to get consensus with if I have a bounded view of the data. And so I would say within an organization having different contexts to represent the domain object or marketing to the employee database or the order history database or the product information. You have different views of things in different scenarios. Absolutely this is a way of managing bounded views of those that data or not requiring as much consensus as you go from the HR system to some kind of technical system. So now information that's stored in the HR system available as a series of JSON APIs can be connected to information stored elsewhere. So now you can imagine linking your HR system into your issue tracking system so things like contact information for the developers who worked on particular APIs, that kind of linkage happens almost transparently. So there's absolutely reasons to use these ideas within an organization because I absolutely reject the idea that there's consensus within the firewall as well. Yeah. Is it also appropriate to store JSON-LD documents in a graph database or do you see them more as an alternative to JSON databases like MongoDB? Certainly, for graph databases that support the namespacing and the standards and things like that, without losing information, you could take the JSON-LD, turn it into RDF and shove it into a triple store, pull it back out, turn it back into JSON-LD to consume an existing tool. So if you wanted to project a JSON-friendly view of the data and yet still store it in a way that you could do reasoning against, that's absolutely on the table. If you are using an existing JSON document-oriented system and you don't want to set up a new system, this is a way of allowing you to store it and retrieve it and maybe you could support the Sparkle protocol or something at the service level. It's really up to you. The thing about JSON-LD is it's letting the people who don't want to care continue to not care. We have time for I think one more question. Is it better to embed JSON-LD than RDFA? Do you see RDFA going away then? That's... No. I mean, there's ultimately solving different problems. RDFA is a way of, if you care, if you have templates, if you have web components that have semantic markup, they can easily use either approach. But what the RDFA allows you to do is at the individual element level, capture and reuse information for both the presentation and the machine-describable format. So you could say like, this div is about this thing. This paragraph is about this thing. JSON-LD embedded is basically just like, here's a blob of stuff that you can extract. But it has no direct linkage to the structure of the document. So you can use either approach. You can use either approach however you want. They solve similar problems, but they also solve different problems. And if you don't actually care about weaving the semantics into the structure of the document, a lot of people find that difficult. If you use templates and things like that, it's not really a big deal. But the... they're both standard ways of weaving arbitrary information into documents. It's just a question of whether you need to reflect the structure or not. Brian Slaten, thank you so much for this great presentation in Q&A. I'm afraid that's all the time we have for today. Just to remind everyone, we will post the recorded webinar and slides to dativersity.net within two business days. I know how to access that material. The next Smart Data webinar will be on September 10th, and our topic will be applying neocortical research to streaming analytics with Subitai Ahmad of New Menta. There is an opportunity to meet today's speaker, Brian Slaten, as well as Subitai next week and San Jose at the Smart Data Conference. If you have not yet registered for that conference, registration is still open, and you as an attendee of this webinar can receive a 20% discount to attend the conference by using the coupon code webinar when you register. So thank you all very much for attending today. We hope to see you next month on September 10th. I hope to see you in person next week, and thanks again to Brian Slaten for all the great information today. Thank you.