 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor for DataVercity. We'd like to thank you for joining today's DataVercity webinar, How Semantic Solves Big Data Challenges, sponsored by MarkLogic. As you join and to note, MarkLogic is also a Plotin sponsor at our upcoming NoSQL now and also of our smart data conferences happening August 18th through 20th in San Jose, California. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions by Twitter using hashtag DataVercity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce our speaker for today, Matt Allen. As a Product Marketing Manager, Matt is responsible for articulating the value of MarkLogic in order to drive awareness and adoption worldwide. Prior to MarkLogic, Matt worked as a strategy and technology consultant with Booth Allen Hamilton and at Ernst & Young and graduated from the University of Virginia. And with that, I will give the floor to Matt to start the presentation. Hello and welcome. Hi, Shannon. Thanks so much for the great introduction. Welcome, everyone. Thanks for joining me today. I'm excited to talk to you about the topic of semantics. MarkLogic has been investing heavily in this graph database technology that we call semantics, and we've been making the investment primarily because our customers have been asking for it. There's a reason that the large tech companies like Google and LinkedIn and Facebook have all developed and implemented graph technology. And I look forward to taking a look at how the same technology that has made those companies successful in handling the challenges of big data can be deployed at yours and any other company. So this past spring, I had the unique opportunity to work on a book called Semantics for Dummies, and you can download it on our website at MarkLogic.com. The book provides a great general overview of semantics, and I would definitely recommend reading it. You can get a free copy on our website. The other reason I mentioned it is because in the process of writing the book, I got to sit down and talk to people in the field actually using semantics, and I got to see exactly how semantics is being used of large organizations to solve some of the really hard problems that we face today, and to build some really cool applications. So I'm going to look forward to sharing some of those experiences and perspectives with you today, and hopefully I'll be able to cast a vision for how you might be able to use semantics in your own organization. So to start, let's ask, why do we need semantics? Why do we need a new model, you know, to model data? I want to start by just framing the problems base that I'll be discussing. Today people talk a lot about the promises of big data. You hear that everything today is bigger and faster, and as we move farther into this digital age, big data is thought of as a solution to all of our challenges. And I would argue, on the contrary, that people often see big data as more of a hassle, that they are often more worried about it than something that they are actually excited to benefit from. I think everything is getting more complex, and that many organizations are drowning in this complexity that we see today, that they are experiencing with their data and across the IT organization. And it's not surprising to me that a recent survey of CIOs that leading companies reported that data silos are the number one impediment to big data. So you're probably wondering why I'm showing this big, great screen right here. You're probably wondering what it is. You've probably heard of the story of the blind men feeling around on the elephant. And I think it's a good analogy to choose in talking about semantics. The story of the blind men feeling around the elephant. You have one man that grabs the tip of the tail and says it's a brush, another grab the leg and thinks it's a pillar, and another grab the tusk and thinks it's a spear. And the parable implies that one's subjective experience can be true, but that such experience is inherently limited by a failure to account for the whole picture, so to speak. And it's a good analogy for the world of data in which context is still important in order to get the whole picture. So as we talked about semantics, I want everyone to keep in mind that at a high level, semantics is about gaining context for your data. So here's a more specific example about the parables of the blind men and the elephant is playing out across the enterprise. Life used to be fairly straightforward for most IT departments. You had a few centrally managed key systems, the ERP and CRM, HR systems, for example, and data was very neat, structured, and didn't change much. However, in today's world, data is big, vast, varied, and changing. IT departments are no longer able to just manage those four systems. They have to worry about dozens of systems and petabytes of data, huge volumes of data, huge varieties of data. And the business is in need of all that data, and they're building more applications. But if the IT department doesn't deliver, they'll just go and set up their own system in order to keep up with competitors. And we start to see that problem with the issue of shadow IT popping up in organizations. You know, pretty soon the organization is dealing with dozens, if not hundreds of different data silos. And it starts to look sort of like Baskin Robbins with its 31 flavors. There's a database for every type of data and purpose and domain. And it's not that having all this data is a bad thing, but it's that it's difficult to manage and get value from it all. It's very difficult to bring these various data types together and search across them. They're going a little bit deeper. I want to show one example that I've been playing out at a lot of media and publishing companies that we've been working with. Here you see a picture of a media asset, a picture of a dog, and you see various roles listed. And each one of these roles has their own independent way of categorizing that image, the metadata for that image. And it means that sharing the image outside of the department or organization becomes really problematic because the language doesn't align. Each person has their unique perspective about the image and they're all valid perspectives. But then if you're trying to categorize the asset in a central repository, how do you do that while still making it possible for the particular searcher, let's say the editor, to find and use the data in a way that makes sense to an editor? So now imagine that same problem being played out among hundreds of thousands of assets and dozens of data repositories. That's the exact problem that media and publishing companies are facing today. So here's another example that further illustrates the importance of context. And this example, I'm showing the challenges in managing the relationships between various entities. Out of basic level relationships can be understood as patterns between entities that fall into a few different categories. Things that are equal, things that are similar or things that are part of a bigger thing or a hierarchical part whole or part containment relationships. And the example above, we have three things that are very different, but you would use the same search term sub to search for each one of these things. And here's another example. When you have a sub and a hoagie, they're obviously the same thing, but how would a computer understand that they are the same thing? As another example, we can compare the parts of a sub. So obviously a sub would have cheese on it, and then obviously cheese would be something independent. And how would a computer understand the context of cheese within the context of the sub versus just cheese standing alone as a group of these different individual foods? I'm sure you're getting a little bit hungrier right now. I apologize. I know it's around lunchtime. So this is the last of the food examples. My point with these examples is showing that while it would be quite obvious if you were talking to your friend that this is what you're looking for, and for example, if you're searching for subways, it's obvious around lunchtime that within the context of that your friend would know that you were looking for a lunch spot. But it wouldn't be so obvious for a computer, and so they might give you results for a bunch of underground transportation options, which as in the case, with even really smart, cognitive systems, they don't understand what you're asking because they don't have the context around it. There are also other examples that we run in pretty frequently with databases. They often have different spellings of the same thing. For example, the state of North Carolina may be in the database as the full name North Carolina, but it also may be listed as just N Carolina or just the abbreviation NC. And it would be difficult to reconcile all those differences across various systems. Then they have hundreds of thousands of customer addresses or other entities listed. Here's another problem from the perspective of a large agriculture company that we're working with. They're using Markovic semantics to create a search application for all of their research data, but the team would really apply for any large research organization. So in this example, you might consider a product R&D pipe one and imagine that you were looking at the process for discovering a new drug or a new type of plant that you want to grow. So we'll say you're a new team of researchers and they're coming on the project late in development in phase four. At this point, the product already has a particular name given to it usually, but you want to go back and not just find information about the product in phase four and what it looks like then, but you want to find information about the product before it was given that marketing surname. So the question is how do you get all of that related information when you don't know what to search for? And this is a particular problem in the agriculture company because the same name might be given to 100 different things. Or what if you're coming in at the new researcher looking at discovery? How can you search for a particular plant trait and get all of the information about it? So you can tell this problem becomes much more difficult if you're using older database technology which can't make the connections that we seem to make very easily today. Here's another example from healthcare. There's a problem with interoperability in healthcare and dealing with different domains of knowledge. It's a little bit familiar for me since I did a lot of consulting in the healthcare industry. This graphic shows different domains that are often formed by the specialties within the medical profession. You may be familiar with ICD-9 billing codes that will often appear on your medical bills that a provider uses to tell the insurance company or payer the different things about you, the patient, and what you receive and need to be billed for in the course of care. ICD stands for the International Classification of Diseases and is the generally accepted standard for most healthcare-related services. However, there's also DSM which stands for the Diagnostic and Statistical Manual of Mental Disorders. So DSM provides criteria for classifying over 300 mental disorders and is often used in the practice of psychiatry, but it's a different model from ICD. So for example, with ICD, the term psychosis is a mental disorder, but with DSM psychosis is a psychotic illness. It describes these two things differently. So how do you count for both of those differences in a data model if you were trying to integrate the data or make easy mappings between the two without substantially changing the original sources of your data? So now try doing that at scale with the six different domains in here or possibly more across thousands of different codes. And then what happens if the code changes and all of a sudden ICD changes its mind and starts calling psychosis something else? So this is just to illustrate the problem that semantics is solving. And semantics can handle these differences fairly easily, and we have customers that are doing it. We'll get into more examples later on, but I would also quickly mention the example of Broadridge, a large financial organization that is using Markovic semantics to tie together various financial knowledge domains, including client models and application ontologies. You may have heard of one of those ontologies called FIBO, but there's also other financial domains, such as FDML, FIXML, ISO, and others. And you can use semantics to start reconciling these different domains and models and having a formal approach to modeling these different domains and knowledge. There are plenty of other examples. The point is that to model and manage data complexity, you really need a smarter database and account for the semantic differences that we're discussing. This is particularly pressing for healthcare. We're companies that are literally spending billions and billions of dollars on very large IT projects, but interoperability continues to be a problem we face today. So how are these problems being solved? The problems that I was just discussing aren't exactly new. We've known about them for a while. These companies that we're working with have come up with some innovative ways to help solve them, but their approaches are furiously using traditional databases. They really haven't worked out that well. I'm using the Dewey Decimal System here as an analogy for how traditional approaches categorize information. The Dewey Decimal System is an approach that you may have heard of. It was used in libraries of putting books on the right shelf, so to speak. It was a wonderfully elegant system for today, but it comes with a problem. The problem is that in the classifying books or plans or car parts or movies or anything else, you have to make choices about where the shelf is that you put something on. They can't go on more than one shelf at a time. The choice is dependent on the librarian making the choice, and in general, no librarian makes the same choice. DBA would make the same choice about modeling data, or no person in an organization would make the same choice as a person in another organization about how to categorize a certain entity. A book in one library will be in one category, and another library is in another category. A book, like a piece of data, had to be put on one shelf. It couldn't be in two places at once, but pretty soon, you start to have conflicts arise. What was designed to be a really elegant, great solution starts to get more complex. Eventually, it starts to look a little bit more like this, and it really just collapses. It illustrates why having a simple hierarchy and a simple mechanism of tagging is not enough. The problem relates back to the underlying model that we've been using to store data. The relational databases that we've been using for the past few decades are really just not designed to handle these semantic challenges. I want to go into a little bit more depth about what we mean by that. Relational databases are great tools, but they do come with many drawbacks. So many of these challenges, I'm not going to go into detail about each one of them, but I do cover some of these other topics in other webinars we do at MarkLogic, but I'll summarize by saying that although the relational model is great for structured data, it is not very flexible. Relational databases require everything to be modeled up front with a very strict schema that defines the tables and columns in which your data goes in, and this involves choices and trade-offs. Relational databases are usually fixed to a specific business purpose as well, like a transactional system for a certain department, and if data needs to be moved around later to another department or to an analyst or merged with another company after that company is acquired, you end up with lots and lots of ETL, extract, transform, load, and the thoughts of the ETL can get really, really expensive. There's also a lot of abstraction involved in moving data from the relational model to an object model used by today's modern programming languages. This involves object relational mapping, this impedance mismatch, and while we've come up with workarounds for this, and it's solid technology today, it does add layers of complexity and inflexibility, and that really thwarts the goal of today's organizations, which is to be more agile and more flexible in their approach to modeling data and building applications. There's also challenges when you want to query cross-area data, which may involve a lot of complex joins or issues with indexing that you run into as relational databases whenever you're trying to do something new. These are all problems that MarkLogic is a no-SQL document database helps solve. Again, I'm not going to go into a lot of detail about document databases. It's sort of a different topic, but MarkLogic is a document-oriented database, and it can address many of these challenges with the inflexibility of a relational model, but there are also other problems that MarkLogic as a document database with semantics helps solve. When we discuss semantics, we're looking at addressing that problem with the inability to model relationships. Semantics focuses on the relationships, which relational databases, as I've shown, are just not really designed for. Some of the previous slides have shown that handling facts and relationships with traditional databases is a huge challenge. There isn't a standard for modeling entities such as people, places, and things. I'm not going to say that relational databases are not a great technology. There's definitely a reason why they are the most popular type of database today. All I'm saying is that it would be a mistake to use them to handle data and do things that they are not designed for. As we get further into talking about the benefits of semantics, we'll start to see how some data is inherently, to use my technical term, graphy, and how semantics is a better fit for graphy-looking data. What are visions for the future? How did semantics get into this vision for the future by helping solve some of the challenges with big data? Most organizations that we talked to today, they want to achieve a 360 view of their customers, or in this case, a patient, but it could be a 360 view of their partners or suppliers or any other business entity. This graphic shows healthcare where a health insurance company or hospital wants to get patient electronic health records, lab results, payment information, drug information, Medicare data, behavioral data, other external data sources and social media. Who knows? A lot of other data sources that they probably don't even know about yet. The same is true for all industries. People want to bring disparate data sources together, and they want to get more value from it. That's our vision for the future. Let's jump into more talking about the underlying technology of semantics so you can understand exactly what we mean by semantics and how it contributes to achieving this vision. My objective in this whole talk is to discuss how this rather benign thing called a triple can solve some of the challenges with big data. When we say semantics, what we mean is a new way of modeling data using the structure of a triple, which is forming, take two entities, two people placed through things and connect them with an action or a relationship. It's a common example. We use a Marth logic, which is a triple that says John lives in London and another triple that says London is in England. You can see that when the object of one triple is the subject of another triple, you can start to form this graph. When you connect them, the database can also make inferences. A fact that you didn't even load initially into the database can be formed by the database automatically by using forward or backward chaining. You can create this inference that says that John lives in England. To follow on that terminology, I mentioned that the database we used to store triples is called a triple store, not surprising. There are also other concepts that are important but probably require more time to get through than we have time today for. These things about shared vocabulary and ontologies and all sorts of things. I don't want to go too in-depth. This is meant to be more to show the business value of semantics. I will show you one piece of data that's not really a code. It's just some data here. This is to make plain what we mean when we say triples. You can see that it's just a series of three. IRIs are international resource identifiers. IRIs like a URL, except that a URL is an identifier for web pages. They're really just pointers for web pages. In this case, you're having identifiers for data or entities. For this reason, triples are often referred to as linked data. Markology can handle any of the data formats here. There's also others. Like other data with Markology, you can inject this data as it is. It's immediately indexed using Markology's triple index, which is important if you want to be able to search the data. Markology also has a triple cache to improve the performance. After ingesting the data, it becomes immediately searchable with Sparkle, which is the standard query language for semantics. With Sparkle, you can ask the database questions, like tell me all the people that live in London or that live in a place that's in England. You can also start to ask more complex questions, like tell me all the people that know someone who knows John. You can see that example used in a system like LinkedIn where it's giving you possible friends that you may know because you know someone. The other thing I'll note when I show this slide is that the example is Turtle. It's actually shown here as three IRIs, and that's a little bit different from the examples below. It shows two IRIs and a string. The only difference is that the second-tier examples, you wouldn't really be able to use inference or you wouldn't be able to have the object of the triple-due subject of another triple. But the spec for RDF does allow you to model it in the different what we would call serialization formats. So, dub gas, what's the benefit of having just a couple of triples in an application? Are there anything unique that triples can be used for? I use this example. I think it's a pretty cool example that we're using in a demo application right now. It's an example of a tweet graph that can be used to enhance the customer record. The example is when one triple is stored in the database, when a customer 123 is tweeting tweet XYZ, and you can actually analyze this tweet for sentiment, and if the sentiment is positive, you can store another triple to show that sentiment, and then you can make an inference that says that the customer, because they tweeted a certain tweet and that tweet was positive, they're likely a high-value customer. They're saying good things about your organization, and when you tag this on your customer record, and you're able to do this all in real time, you're able to ask really relevant questions that are important for the business. You can start to ask questions like, when this customer is walking into the store and they're saying positive things about us, should we reward them? And this is a quick example of how you're able to bring in these different types of data sources and start to do new things because you're able to leverage all these different data sources in new ways. I want the inference right there. So you can do a lot with just triples, but I'm going to make the argument that the real magic of semantics, of having RDF and sparkle, comes when you're using triples alongside documents. So I want to show you what looks like a document to MarkLogic. This is a document that MarkLogic sees it. This would be a JSON or an XML document, and you might compare it to the rows in a relational table. It'd be a hierarchical tree format, and the documents are schema agnostic. They're human readable, and they don't carry all the entity integrity constraints that you had as a relational model. And you can do a lot with documents. I'm not going to go into much detail about MarkLogic and documents, but I will say that even documents can fall short when it comes to optimizing for facts and relationships, which are best stored in a graph model as triples. So here you can see a graph that's connected to the document that's storing information about this asset, about the work, about the character that appeared in the work, information about that character. And you can see how this information is inherently graphy. And with a single query, you can bring back the document parts of the graph, or both documents and the graph, and have it all materialized at a long time. So it gives you a lot of flexibility on how you want to query the data as well. I do see a number of questions coming in. I don't have a moderator bringing the questions as anyone can. I'm going to save some of these questions until the end. We'll definitely have some time in the end to address some of the questions. So what I'm showing here is the multi-model view of data. My argument is that it provides a lot more flexibility and agility than any other model. Here we have an example of some healthcare data. On the left is a document where we can see that an operation was performed and certain drugs were prescribed for the operation. It's very simple, intuitive, and human readable as I mentioned. You can see that just by looking at it, you know what it is based on the context. This could be JSON or XML. It's just a more visual form represented here. You can see how the structure has meaning. It's not semantic meaning, but meaning from the structure of the document. Now on the right, we can see the relational model you might be familiar with. To get to this model, we take our nice contextual document data and then we shred it. It has some benefits. As we now have modular pieces of data that can stand alone, and you can take those pieces of data and you can use them in any application and share it across the enterprise. Now, there are many people who would argue that the relational model creates consistency and is dependable. I think that in reality, every data model or a model stands differently. You're probably looking at this model right now and thinking, are you a model or differently? The fact is no model is ever the same as any other. It gets done differently in every organization. The same is true with any structured approach to data modeling, which would even include other new SQL models, such as key-value or columns stores, as well to some extent. In contrast, the document model keeps data together and you can answer questions by bringing back a single document. You can model the data to match the business model one-to-one. You don't have to extract the business model into logical and physical models. There is no mismatch. In a document, you don't have normalization. Denormalization is okay. You can have many, many-to-one relationships. It's a more simple and flexible way to store your data. So what about the other thing? What about the graph? I see the graph is a great supplement to the document model because it adds the ability to model relationships. You can see by looking at the document that there are relationships based on the structure. But you cannot query them. With semantics, you can also query the relationships. This is so critical for making sense of today's data. Let me give you a quick example. The version that there are surgeons in the left is Robert Allen. This example is my dad's. Robert Allen is a surgeon. But sometimes people call him Bob Allen. And if this is the case, you can create a trouble to show that the relationship and then choose to synchronize those changes you make in a reference document to all the transaction documents associated with it. And it has the added benefit of maintaining integrity and history of information. Another database, another document database, that you would just create pointers to other documents and you would lose the ability to query the relationship. I hope you can see what I'm getting at here, which is that the document and graph model together provide an approach to modeling data that is much more powerful and flexible than the relational model. An example that helps illustrate this point is that a customer who are working with Broadridge, a large financial organization that I mentioned, they process millions of trades and they store the data as documents. These documents are kept for literally small and minimal, which aids in performance. But they use a few triples to insert on the document. And the triples are used to give it provenance, to connect it to the larger bulk of trades, to connect it to the other party, counterparty and other things that they might be interested in. So then when and if they want to, the triples can be used to provide context for a particular trade. And if the name of a counterparty or something else changes later, it's easy to update all the trades at once. You only have to update the fact in one place. So now you're probably like asking why I believe what I'm saying. And I want to provide just a little bit of a proof point for where the market is hitting. I've been talking about the advantages of data modeling with mostly structured information. And that was sort of tipping the scales a bit towards relational excuse in reality. There's also a ton of unstructured information that relational databases are completely missing out on. There's not design animal unstructured data, which in fact is pretty surprising because about 80% or so of all data is unstructured. And relational databases, even though they account for about 95% of all the money we spend on databases each year, it's surprising that they can only hold about 20% of today's data. Naturally, organizations are starting to ask about that other 80%. And the regarding document databases, they are the most popular type of most equal databases in comparison to the other types, such as column stores or key value stores or graph stores. And these other, others are very important, but document stores have a broader number of use cases they can address and are better fit when talking about general purpose databases. And graph stores have also been growing in popularity, including with property graphs and triple stores that are a little different, but I won't get into a lot of the details about the differences between those two. I just want to make the point that companies are starting to realize that relational databases are not solving all their problems. They're starting to move to other technologies. I want to briefly mention MarkLogic. I'm not going to go into a lot of detail about the capabilities of how MarkLogic manages semantics, but I want to look at the data model. I want to look at the biggest differentiators of MarkLogic is the fact that it does all the data and documents as JSON or XML can also store RDF as triples. Some people call this a multi-model database or a hybrid model, in which you can store data of these various formats. But this model is completely unique to MarkLogic. They voted to store documents and triples together in the same database and mix them together in various ways. And I think this is the reason why MarkLogic will succeed. This model is inherently more flexible and really makes you able to model the data and build applications a lot faster. I also want to mention that MarkLogic's enterprise features, we call MarkLogic the only enterprise in the SQL database simply because it has all the enterprise features that relational databases have and that we would expect to have for any large system. It's the high availability and disaster recovery, certified security, transactional consistency and scalability. All things are really, really critical for any system that's trying to manage data at scale. Another big advantage is that MarkLogic has built-in search and query. This is a huge differentiating feature and really what I see is a philosophical difference between MarkLogic and other systems. Our founder, Chris Wimble, I believe that companies should have the ability to not only have a place to put all their data, but they should be able to search it. This system searches completely second-day to the database with MarkLogic that's just not the case. It's built-in and it's a core part of what MarkLogic does. Together, all this capability makes MarkLogic a really powerful single comprehensive platform and that's the reason that some people refer to MarkLogic as the Swiss Army knife for their data. Rather than spending more time talking about MarkLogic, I want to take MarkLogic in a broader context. When we talk about semantics, there's a lot of confusion in the marketplace and I just want to try to provide a little bit of clarity. There's a bunch of different terms up here. There's probably a lot of different terms that I'm missing and there's probably some disagreement about some of these different terms. We're starting to see a little bit more differentiation in the marketplace. In the SQL world, it's been pretty clear there's different types of no SQL databases. MarkLogic would be a document-oriented database and then you also hear about graph databases and when you say graph database, a lot of people often refer to property graphs and they think of property graphs as graph databases. We like to think of a triple store as another type of graph database. When people say semantics though, they often are referring to other semantic tools and technologies. They're talking about entity extraction or natural language processing. We work with partners like SmartLogic that do entity extraction where you can take a bunch of unstructured text and you can mark it up and pull out those entities and store them as triples in MarkLogic. That gives the ability to be able to have a machine readable format for unstructured data. That really falls into the realm of natural language processing. You can also use a graph of triples and a bunch of documents to create a knowledge graph. We have companies that are also doing that on MarkLogic. As we see, some of this gets confused with AI and I don't want to make the point that semantics is AI. I definitely don't want to get into that, but I do think that it's a precursor for some of these other technologies and they're starting to trend in that direction. Another thing to keep in mind is that how we see semantics today is much different than the semantic web that was originally envisioned. The semantic web in RDF has actually been around for a really long time and that's why we have so many experts today in the field of semantics. I put this screenshot up of Tim Berners-Lee. He's probably known as the inventor of the Internet. You proposed the idea for this semantic web as web 2.0 and really came in the evangelist for it. You can see that this screenshot, it has over a million views. He was proposing that linked data be the new model for the Internet and that everyone needed to start using RDF as the common framework for public data so that we can connect all data across the web. I think that's really different from how we understand semantics today, even outside of my project. You see semantics as a great model to enhance your organization's data internally and to build smarter applications. It's really an enhancement to a powerful data model, the document model, and not necessarily a comprehensive standalone solution. That's why we've seen so much traction with semantics lately in Mark Logic. It's the combination of semantics with other technologies that makes it so powerful. I know today we have big data. We have all these different data sources that are now available and we also have the general evolution and improvements of the speed of computing, but I would argue that the biggest thing that is making semantics relevant right now is by having it baked into a database and having it as a supplemental approach to data modeling, and that's what reduces the ease of adoption within organizations and just gives them another tool in their toolbox to do new things that they couldn't do with their data before and do it faster with more flexibility. So what are the benefits of the Mark Logic semantics? I'll try to summarize them quickly here. I'm not going to go into a lot of detail about each one of these bullets, and I've already mentioned some of them. I think the biggest takeaway is that it's great for modeling atomic facts and relationships. You can use semantics to create that knowledge graph to bring in open-linked data and classify large amounts of data. We mentioned open-linked data. You can go to sources like DBPD or the CIA World Backbook or tons of different other free repositories for triples, and you can bring those in your own application and you can also create your own triples. There's a lot of different ways that you can get triples. Using this model, you can model those complex relationships and you can use it with a formalized common standard, and that's really, really important when you're talking about the scale that we see today with not just a million triples or even 10 million, but hundreds of millions of triples. And when you do this and you use inference, you can start to discover hidden facts in your data when you're using an intelligent search application that's using semantics. And you're able to visualize this information, so we have a lot of apps that might take a graph and actually visualize the relationships as well. One thing that often gets overlooked is the benefit of triples for storing metadata. Because they are just a simple atomic format, it's great for storing facts as metadata and making that data shareable. And I've already mentioned that some of my data, I will mention the ability of triples to enhance the ability to integrate disparate data. So if you have one thing that's like another, you can simply describe that relationship with semantics and you can connect things that you weren't able to connect because there's simply two different things that you either had to use ETL for or just live with the dimensions or make one the same as the other rather than just saying they're the same based on a semantic relationship. You're also able to publish facts using an automatic process. The BBC is a big organization that's doing it. They started doing that back in 2012 with the Olympics, the biggest case study you can learn more about on our website. There's also other semantic technologies. I mentioned smart logic, there are others. Smart logic is a great partner. We want to take all that unstructured text and extract meaning from that data and be able to classify it using semantics. And we have lots of customers that are doing that now. Before I start, I'll mention that this big takeaway here is to remember that semantics are facts, relationships, and metadata. I mentioned that it's important to use semantics whenever your data looks a little bit graphy and there's no best practices for saying here's exactly when you use RDF, here's exactly when you use a document or any other model. The important thing is that you have the option to model your data using the triples and using RDF format and to query it using Sparkle when it starts to look like you're having a graph of relationships or when you have a lot of metadata that you need to classify it in a standard way across your organization. So what are some of the organizations that are using semantics? Like I said, it's been gaining traction recently and today we see a lot of these major organizations that are using modelogic semantics to accomplish things that they couldn't do before. This list does change pretty frequently and it's tough to even keep it updated. You'll notice two things here. There are some big name companies that are using semantics but there are also a lot of smaller players. It's not hard to get going with semantics. And also the other thing to notice is that the use cases vary quite a bit. There are many, many applications for semantics. Like I said, we're just giving you the easel on the canvas. We're not telling you exactly how you should use semantics. That's why I started the discussion by talking about some of the other broader problems so you can begin to imagine how you might be able to use semantics in your organization. The example of intelligent search, we have codified Mitchell One is a great example. Mitchell One is a company that stores information about car repair. And with car repair, you have all the different parts of the car and you want to create what they call a partonomy of all those different parts. And ontology for the parts of an organization, you're trying to classify hundreds of thousands of different parts for all these various manufacturers and even use that to power intelligent applications where you're able to say if one part failed and a car, this next part might likely fail based on the relationships that they're able to define using semantics. We also have companies doing data integration at a pretty big scale. We have CABI, this other agriculture company that I can't mention at the bottom. We also have another large entertainment company that's integrating data across various silos. I mentioned media and publishing, being a big use case for semantics. A large agriculture company, and I think it's an interesting case study, they are building a semantics intelligence platform in order to bring over 90 different data sources together from various points in their R&D process and make all of that data searchable on one single application, on one platform that they're building. And this is enabling them to ask really tough questions so they can query the data and ask things like, what is the corn yield and the underlying soil type for certain sets of data? Or tell me all the genomic elements that indirectly contribute or regulate a specified developmental process and a plant or crop species that they're investigating. And the platform does use other partners to manage their customized ontology and they use mock logic to store all of the data and make it searchable. So lots of different case studies and if you have other ideas, I would love to sit down and explore different ways you can use semantics. You know, we're on the cutting edge. We're seeing these organizations, they're using the semantics for things that we didn't even guess initially that they could use semantics for. I want to mention one very fun example, which is the Saturday Night Live app that NBC recently built. So you can actually open this up. If you haven't downloaded it already, you can actually open up your phone and go to the store on Apple or on Android and download the app right now and play around with it. If you haven't seen it Saturday Night Live and aren't familiar with it, then I would definitely recommend doing it. It's a hilarious number of sketches and what they did with the app is that they took 40 years' worth of content from their show and put it all on this app and made it searchable. Let me show you what the graph within this app sort of looks like. So with the app, you can see when you log in, you ask what area you're interested in and are they able to connect that error with different segments that are in the show that you might like and then they're able to connect the segment to the episode and the season when it was aired in the date and it's also able to show the connection with the talent that acted in this segment and the character that they played. You can see how this data starts to look very much like a graph. And we mentioned this example with Kristen Wiig. Kristen Wiig is one of my favorite characters. One of the things that is important is being able to search how users want to search for things. So the app even characterizes things about the characters that appear in the app. So I use this example from this segment called the Lawrence Welk Show. And the example is that in the sketch, you would never remember the name of this character. This character's name was Denise Maro and you would never remember that from watching this show. But you would definitely remember that in the sketch. The character had tiny hands and you're able to actually search for that character because that information is being indexed and it's part of the graph so you can actually search for a character and it'll bring up the name of the character and also the sketch that it was in. Another example is the world of Barack Obama. This is also a very complex modeling problem. So we have Barack Obama himself. He actually had a cameo on the show and appeared as himself. We also had different impersonations of Barack Obama. Fred Armisen impersonated Barack Obama. We also had Jay Farah who impersonated Barack Obama. And you may have heard of Barack and he also appeared at the different character called Barack Obama. So the question is, how would you model this in a relational database? And I'm sure you could probably come up with some way to model it but the fact is that this is a series of complex relationships and it just works better as a graph and you're able to see and understand these complex relationships from the point of somebody modeling the data and also from the perspective of the user. So this allowed the team that was building the app to focus less on a complex data model and start to focus on other things that were more interesting like building in an intelligent aspect to the app, which they did. It's actually kind of a predictive analytics engine where a user who is watching videos gets their usage logged and then gets recommendations for other videos that they actually might like based on their usage and which videos they liked or didn't like. And it's actually a pretty complex algorithm that works to determine the videos that they may be interested in. The point of this is it's a very fun app. I would definitely download it. It's not like a huge enterprise app but it is a revenue-driving app and it's a cool app that is using semantics. And the point is when you put data at the center of the app and start modeling data in the way that makes the most sense and you give data the chance to take center stage, it really creates results. The FML 40th anniversary show brought over 20 million viewers and that was the most viewers for any NBC primetime entertainment show in over 10 years. I can't go into all the details about the app as successful as the show itself. And I would definitely encourage you to check it out. So I'll leave you with the link to the book, Semantics for Dummies. It's available for free on our website. If you want more information about using semantics and want to get started, you can go download it. It gives a great overview of semantics. Also, as Shannon mentioned earlier, we have the NoSQL Now and Smart Data conference coming up in San Jose on August 19th and 20th. There's going to be many of the experts in the semantic world gathering there. Together, there are great talks. I will be there. We also have our senior director of product management, Steve Buxton, who will also be there, as well as some other folks. You know, just to talk about semantics, we've been working in semantics for a long time, doing some hands-on workshops, as well that you can sign up for if you want to get a hands-on workshop that walks you through MarkLogic semantics. We also have other free trainings available on MarkLogic's website. We have engineers, full-time staff that provides free training. You can go sign up for that on our website, as well. And we have other webinars that we're doing this summer that are both business-volunt and a little bit more technical on semantics. So if you do have questions, you can email me directly. This is my email, Matt.Allen, at MarkLogic.com. And at the end of the webinar, I'm going to look through the questions right now and see if I can do my best to try to answer some of these questions. So we had a ton of questions coming in. I'm sorry I wasn't able to get to some of these questions as the webinar was going on. I know it was a lot of a conversation and more me presenting slides. I apologize for the form. I wasn't able to let me get to the questions. So let me scan through. You know, you're looking through that and it's been a fantastic presentation. One of the most common questions, of course, is whether or not people are going to get a copy of the slides. I love all the questions coming in and keep them coming in. I will be just a reminder for everyone, I'll be sending a follow-up email by end of the day, Monday for this presentation within two business days with links to the slides, the recording, and anything else requested throughout the webinar. So hopefully that gave you a second map to look through the questions. One question. You know, this RDF graph database technology like the Craig graph computer works with. I'm not familiar with the Craig system. I know that that's a super computer that's at a sort of different level. I think the technology we're looking at, you know, actually just runs on commodity hardware. You know, MarkLogic, you can download, you can run it on Amazon Web Services commodity hardware. It doesn't require special hardware or anything to get going with it. Another question is, the IRI, does that have to be a valid URL pointing to a location in the web? The URLs that, you know, I provided that as an example. Maybe that was more confusing than it ought to have been. But a URL would just be a pointer that points to a document. An IRI is actually just an identifier for that particular piece of data. People used to say that, you know, every IRI is unique. That's not necessarily necessarily true. But generally, you know, within an organization you would create a URI that would be at least unique for your domain that you're working in. Another question is tying the customer to the tweet in that example I gave about the demo that's creating the tweet graph. That project is actually available on GitHub. If you email me, I can send over the link to that. One of our sales engineers created the demo for storing customer records. And I believe that the document model would be used to store the customer record and then there would be an identifier that connects the customer to their particular tweets so that they can quickly pull up the names of the customers and the tweets or any other information about the customer. And that's a great example of how you would connect it. But again, you actually look to the code. I mean, if you email me, you can actually just go on GitHub and look through how it's designed and set up a meeting with the actual developer as well. As Shannon mentioned, there are some questions about the PowerPoint deck and recording. I know we'll be sending that out so you can have the material. Someone asked, there's a lot of research on semantics. I already have another semantic technology in Europe, more so than US or APAC, any specific reason. Yeah, I can't really speak to that yet. I think for folks who have been around the semantic world for a lot longer, they may have seen that. Our senior director of product management is from the UK, so that probably supports that assertion. And the BBC was one of the initial folks that we started working with semantics on. They started working on the application for the Olympics back in 2012 using semantics. But I think it's probably applicable and we're seeing a whole lot of US companies that are also adopting semantics and not just smaller customers, but some big customers, some big applications over in the US that are also using semantics. I think it's really available for anyone. I think it's companies that are looking for more innovative ways to model their data. And I think peer semantics has been around for a while. There are a lot of experts that grew up with it with the BBC and in the UK, but the technology is pretty widely available today. Another question, when you query the Sparkle, does it work only if the data documents are stored in an RDF, whatever the serialization output format? Yes, the Sparkle would only apply to RDF. In MarkLogic, you can actually use XQuery to query RDF as well. And there are a lot of variations of how you can mix the data models in MarkLogic. So you can embed triples in a document. You can also annotate documents with triples, and you can use query to query across them. You can issue a document query using JavaScript or XQuery, and then further filter that down with a Sparkle query or vice versa. Again, this isn't a technical talk. We don't get into a lot of details about how you can actually mix the models and the different types of queries that you can answer. I definitely encourage you to sign up for one of our other technical webinars, in which we go into more depth about how you mix those models and different query patterns that you can use for the different models. Let's see what built-in MarkLogic feature or extra software products are available to recognize or done in conflicting triples or graph portions. It seems that one can inadvertently create uses or misleading triples or graph portions. The issue of data quality, I think it is feeling more and more important in creating and controlling your data set. Again, that's why we mentioned the fact of having these unique IRIs, and I think that that's critical. I haven't run into a specific question that you're asking, so I don't want to say something that would be incorrect, but I would assume that when you're issuing a Sparkle query, you can actually just pull back data and literally visualize in our query console what are some things that are going to be redundant. I know in our data modeling exercises, MarkLogic provides a whole lot of tools. Some of them are a custom tool that was just developed in the wild to handle data modeling and transformations within MarkLogic. It's fairly easy to take data into MarkLogic and then transform it once it gets into MarkLogic and to handle issues such as when you do have redundant data or need to do small transformations on the data. And I believe you start to do the same things with triple data as well if you have a problem of redundant or conflicting triples. Do you have overlap between information captured in your document database and the info captured in triples? Not necessarily, but you could. It's up to you. You give you options and when we talk about flexibility, you can model your data as documents. You can model those triples. You can model those both and then have more query options later on. It's up to you. And I think one of the big differentiating things is MarkLogic, is that you can actually make changes midway through when you're developing the applications. If you're doing this in a relational database, you model the data and then you really can't make changes later. If you did, you'd have to change the entire schema. And it's actually would be have a lot of ramifications for your application code and other things. And it's really a time consuming process. But we have a lot of customers that build model data, you know, realize that something is a failure's performance that they want to be in the making is remodel the data really quickly. And they can do this in a very iterative agile way and that allows them to build the applications faster. Any other questions? Are your customers considered using the framework that they are developing for data supporting manufacturing operations in addition to R&D product development? There are definitely a few examples and we're starting to hear more examples with companies that are looking at manufacturing processes. I talked about the one automotive example. There are other examples. We've been talking a little bit more to a large energy company about doing some more with their operational data. We're already working with them to store and integrate a lot of their other research data for compliance purposes. But we're also looking at manufacturing. I think the use case for that would vary. But one example would be being able to store even a bunch of documentation for managing equipment and being able to say, you know, if this part of a piece of equipment failed, this other part might likely failed in creating a graph of equipment within a manufacturing process. And also, because MarkLogic is able to index free text, you can actually take manufacturing document data that describes the equipment and have that be very searchable. So for a technician in the field, the application would obviously be very, very helpful for somebody who needs to quickly say, okay, I'm seeing this part failed. What is this part? And what are the other parts that I'm going to look at that are related to that part? And that might be one example of the manufacturing process. There are probably other examples as well. So I know we're running up against the end of the hour. Shannon, I would ask if anybody does have any additional questions. Feel free to email me. And I hope you sign up for some of our other webinars. Download the book if we can see you in San Jose in a few weeks. Matt, thank you so much. This is a great webinar. And thanks to our attendees as always for firping. So engage in everything we do in asking such fantastic questions. As Matt said, his email is right there. I will also include it in our follow-up email as well, which, again, will go out by End of Day Monday with links to the slides and the recording. Matt, thanks so much for taking the time today. I really appreciate it. And I will see you in San Jose. Great. Thanks so much, Shannon. Thank you. Bye, everyone. Bye.