 And thanks a lot to DataVersity for allowing me to speak to you guys. So a lot of you are probably wondering why is an entertainment company talking at a database conference. And originally we were going to announce a new asset compliant, transaction safe, quantum distributed key value in memory data store. But we feel that product is not quite ready for prime time. So instead I'll talk about the challenges of modeling superheroes. The title of this talk is solving fictional problems with no sequel. And when I say fictional problems, I'm not talking about, say, problems we've invented because we are neurotic and unstable individuals, but problems that we encounter in representing fictional worlds and fictional characters to consumers and to the general public. So I'm hoping that some of the explorations we've done in representations of fiction and application of no sequel technologies to those representations help you in the complex real world data sets that you encounter in your real and probably more important jobs than mine. So very briefly about Marvel, basically what Marvel does is we tell stories with a set of some of the most well known characters in the world. You know, I defy you to walk down a busy street and not see someone with a Spider Man T-shirt or a kid with a Spider Man backpack or other characters that are in the Marvel stable. Our films, we've made a few relatively small art house movies over the course of the last few years. The Avengers franchise alone has grossed over $5.5 billion. We hold the number one and number two record for box office opens and if it wasn't for that pesky Jim Cameron, we would actually have the number one grossing movie of all time with the Avengers that was out last year. And we're famous for our comic books. We're consistently the number one comic book publisher by sales. If there's DC fans, we may argue about quality. We have a number of TV programs right now and we're actually getting back into the live action TV space this fall with Marvel's Agents of S.H.I.E.L.D., which is our first foray into live action since the Incredible Hulk from the 80s. So we're very excited about that. And we're one of the top entertainment brands for consumer products for boys. So we are engaged in a large number of media and we are characters and our representation of characters are really what carry through across all of those different media. Very briefly on how we use technology, particularly consumer facing technology. And I'm in the digital media group at Marvel which deals with really anything that is a digital product that it's consumer facing. We're coming from a very traditional kind of web-centric lamp environment. But as we've grown and as our products have matured, we're actually getting into more, I think, interesting and somewhat more enjoyable technologies to work with. So our API platform, which is something we're actually hoping to make a little more publicly available later this fall, runs on Node.js and the Go programming language from Google and various document stores. We use Hadoop and the Hadoop ecosystem for big data analysis. We use graph databases to some degree. That's what largely this talk is about. And then through our relationship with Stark Industries, we have access to a number of the Stark Tech Z5000s. We're really looking forward to the Z6000 line when it comes out later this fall. Half the room is laughing. The other half is frantically Googling right now. I'm sorry. And so a brief tour of our properties or sort of what the landscape is from a data perspective. Marvel has been telling stories for over 70 years. Our first comic appeared in 1939. And in that time, there's over 30,000 comics that have been published, collaborated on by over 5,000 creators. Our stable includes 8,000 named characters. We've had 32 movies produced between our license-ret relationships and our in-house studio Marvel Studios. 30 plus television shows, which often will span into the hundreds of episodes and 100 plus video games. Now these are not big data numbers per se, but it does create very, very challenging things when representing this volume of products to customers. Especially in the realm of comics where customers, where you need a certain amount of domain knowledge just to get into the products. And the thing that all of these have in common is that in superhero stories, anything can happen. You can have characters who fly. You can have characters who get exposed to massive amounts of radiation and get incredible powers from that. Southern accents sound like that. I'm from Mississippi, I can say that. This is not what a Southern accent sounds like. Rogue, of course, is played by Anna Pacquan, who has gone on to massacre the Southern accent in True Blood. So how do you model? How do you apply structure? How can you create representations of a world where literally anything can happen, only bounded by the creativity of the creators who are collaborating on it? I mean, I've seen a few talks on how CERN handles data and they are throwing away thousands of petabytes per second or whatever the statistic is. But at least they're bound by the speed of light and the laws of physics. So I'm gonna talk about a couple of specific challenges in modeling this. And the first one is what I call object fluidity or character fluidity or the tau of Hawkeye. So for a long time, we thought of characters as basically static entities. So you have an entity that basically doesn't change. You have, so Hawkeye, and if you saw the Avengers, Hawkeye was the Jeremy Renner character, and he has certain attributes. He has blonde hair, he's an archer, he has a purple costume. And we've conceived of these as basically sort of a static set of attributes. And then relationally, there we go. Relationally, we can just tie these static entities to properties or products or comics or TV show episodes or what have you. So things like Magneto and Cyclops will appear in a particular issue, or Dazzler will appear in another one. And this is not a bad way to represent relationships. It allows us to do a number of things. Like we can give you a list of books if you want a list of books with Cyclops in them. We can give you a list of TV shows if you want a list of TV shows with a particular character in them. But it starts to break down when we really want to represent the universe that these stories take place in. Cuz one of the things about fiction that's I think actually definitive about fiction is that characters change. A story is something in which a character goes through a meaningful change. And in our stories which have been running for, which are a long running soap opera for over 70 years, they change a lot. They assume new identities. They join teams and leave teams and organizations. They die and come back to life, and sometimes die and come back to life again a few times. They, new versions, new iterations of the characters are created. They appear in different media. They appear in, they are refreshed from, to become more relevant to the current time. And they cross over to different media. So the old model looks something like this. You have a static entity with certain properties. And at different times, all of those properties are false. So Hawkeye is not always Clint Barton. He doesn't always shoot arrows. He can assume other identities that don't have that power. He doesn't always have a purple costume. He is a member of the Avengers except when he's not a member of the Avengers. He's solo or he's off leading the Thunderbolts or whatever team he might be on. So in reality, Hawkeye, his life looks like this. This is what Clint Barton, Clint Barton's life really looks like. He was a villain, then he was Goliath, then he was Hawkeye, then he was Goliath again, then he was Hawkeye again, then he was dead. Then he was Ronan, and then he was Hawkeye. And imagine this type of lifeline spread across 8,000 characters and then across multiple media. So these are all different representations of Spider-Man. You have six or so that we've plucked out of the comics. You have two completely separate movie franchises. And these are a number of representations of Spider-Man from television, including there at the far right, the late 70s live-action Japanese Spider-Man, in which Spider-Man fights crime with the help of a giant transforming mecha dinosaur. You can watch all of the episodes on marvel.com and they are wonderful. So a second challenge, comic bibliography is a very complicated thing. And we call this the Captain America problem, not because there's a problem with Captain America, but it's indicative of the challenge. So in most periodical media, and comics are essentially periodical magazines, bibliographical structures are there for organization. So if you think if you read The New Yorker or Cosmo or Tiger Beat, if there are any Tiger Beat fans here, they are arranged in series and volumes and they go sequentially and that sequence is really the only organizational structure you have. In comics, bibliographical organization is actually a storytelling and to some degree a marketing tool. And they play with the structures in ways to communicate things to the readers. So this is the Captain America problem. Let's say I've just seen the Captain America movie that was out a couple of years ago. I want to start reading Captain America comics. I think most people would logically say, oh, I'll read Captain America number one. Well, there are six Captain America number ones. And every time that they've been rebooted or every time they've been restarted, it's for a reason. It's to signal something important has happened in the life of this character or a important change in creative teams. If you actually want to read the first appearance of Captain America, it's not in a book called Captain America. It's in a book called Captain America Comics. And this is from 1941 and that is Captain America punching Hitler in the face. And the first book called Captain America doesn't start with issue one. It starts with issue 100. And this type of thing happens all the time. It's actually not unique to Marvel. DC recently renumbered their entire line. It's a fairly common thing just within the comics world. But these are a few examples of series. The X-Men becomes New X-Men, becomes X-Men, becomes X-Men Legacy, all with the same numbering. Incredible Hulk drops its main character, but Cap keeps its numbering. And my favorite one from a modeling perspective, Deadpool Teamup starts at number 900 and counts backwards. Recently, they also changed our issue numbers. They started doing floating point digits at the end of the issue numbers instead of integers, which is an interesting conversation between the marketing folks and the DBAs. And a second challenge with bibliography is that bibliography doesn't always follow story structure. So if you think of story, if you remember to your high school English, stories have an initiating incident. They have rising action. They have some kind of resolution. They have a falling action. And you would expect that bibliography reflects this. So this is a series of Spider-Man books that were published at the same time a few years ago. And you would think that the stories follow the sequences where the arrows are, which are pointing issue one to two to three to four or five hundred twenty-five, twenty-six, twenty-seven, et cetera, et cetera. But as you might guess from the color coding in this, that's not actually how this story goes. The story goes like this. So in order to read this story, which is called Spider-Man the Other, the story goes across the different series back up and back up and down. And this is very, very complicated to present to users, especially now when we've gone from a business model that is primarily about putting sequential books out month by month and into one where our entire catalog is available digitally. It's very, very difficult to get new users to understand something like this without leading them down the garden path. I mean, so to recap these, how do we make, how do we create new aggregation structures and aggregate characters where it's important, but also present granular views of those characters to the users who care about that? How can we accommodate fluidity like new entity types, new relationships, sequences, memberships? How can we model a changing set, model on shifting sands? So instead of static entities, we need to be able to represent really fluid relationships. Instead of fixed structures, we need to be able to represent dynamic sequences and without having to say go back, recode entire applications, recode entire database schemas which might be fixed in various ways. So this is why the graph is a very, very compelling and powerful conceptual framework for Marvel as a company and for companies like Marvel that are in the IP space. And so this is probably a very basic graph theory for most people here, but I'm going to quickly go over it. So a graph is really just a collection of entities and relationships between those entities. So in this very simple one, you have three nodes, you have an edge which connects this top two and what's called a hyper edge, which is really just an edge that has more than one node attached to it. And for us, this concept of a hyper edge is actually very powerful. So let's go back to Hawkeye. Now that you're familiar with that bit of comic trivia. So we have Hawkeye and we have, there's a set of attributes that surround this identity and Hawkeye is really just an identity that someone assumes. And we know there's this person, Clint Barton, who sometimes takes the identity of Hawkeye and these two things are connected, but they're not connected in a static way. Clint Barton is Hawkeye only at certain moments in time. So what we can do is create a graph and a hyper edge which defines this relationship that at specific moments in time, this character assumes this identity. And we can say, okay, this moment appears in particular products, particular episodes, particular things that correspond to saleable or purchasable or watchable things. And this really helps us when say modeling out teams. We can say, okay, Clint Barton appears in the Avengers at a particular moment in time in a particular issue. And this is actually very useful for us. We've had a long time, for a long time we've struggled with representation of teams because people think of teams as kind of like the Beatles or U2 with a static membership. But in fact, they're more like baseball lineups. They change constantly from issue to issue. And the more popular the team, the more fluid it is because people want to boost their favorite character. Writers are selfish. And also sometimes Hawkeye could be in a team at a particular time as a particular alias or just as himself. This also gives us a very powerful aggregation framework. So we can say there's this general idea of Hawkeye that crosses media. That's independent of individual representations. We know these guys are kind of connected to this sort of Uber idea or these guys are emanations of that concept. It also allows us to do things like place set in easy ways without having to go back, define a new entity type, recode our entire schema and all the things you have to do with a traditional relational database. You can also, we can also do things like place where moments happen in the real world. And one of the distinctive things about Marvel Stories is they all take place for the most part, they take place in real locations even though they're fictional things. I mean, things like Atlantis in the blue area of the moon not withstanding. We can even do things like track the uses of costumes and stuff over time and make kind of cool galleries and stuff like that. We can also, the graph also allows us to do interesting things with sequencing outside normal bibliography. So if you think of these two rows as moving from left to right as the sequence in publication and then the lines, the diagonal lines, the sequence in story, you can represent things much easier and much more intuitively to consumers without having to create new organizational structures in a very static database. We can even do things like this where we can represent aggregations across ages of comics such as the golden age, the silver age, the modern age and events and even represent chronological publication sequence versus story sequence versus bibliographical sequence. So this is all very powerful and the wonderful thing about a dedicated graph databases and dedicated NoSQL or in the NoSQL space generally we don't have to recode entire entities and applications to do this. We can really do this on the fly. So why is this important? Why is it important to marvel as a business? First of all, comics as I think hopefully have proven are very, very challenging to represent particularly to new readers and with the advent of digital distribution we have many, many new readers and with the popularity of our movies. Secondly, the graph allows us to learn things about our comics that we didn't know already. So interesting reactions do certain characters fall within larger franchises. Lots of interesting kind of stuff surfaces when you even run very simple visualizations against them. Sorry. And then third, it allows us to model characters transmedia in a consistent way which is something we had struggled with before. And finally, we are able to apply user behavior and user interactions in the larger media graph or the in-universe graph and understand and do predictive analytics and predictive modeling to target, say, marketing web experiences, app experiences to the user. So I haven't talked so much about the user behavioral aspects of this because the comic part is cool but we are looking at this as both a very much a understand and represent our IP to our fans and then understand and then use that to drive behavior in consumer purchasing. I think also more importantly, data is actually very much part of the comics experience. If you have ever been in an argument over whether Wolverine could beat up Sabertooth or Spider-Man could beat up Captain America, you've been in a discussion about data. And this is from the publication of handbooks to fans who have done incredible graphs on their own or incredible visualizations on their own, we know that data is very important to our experience. So I'm gonna end with a couple of visualizations and so this is a graph of character interactions in the Marvel universe of over I think 70 years. So each point in this is a character, each line is a shared appearance between that character and other character. What I found really interesting is we did a standard force repulsion or algorithm to spread these out but what I found was really interesting is when we applied modularity groups against it, you get groups that correspond basically to our core franchises. So this is the X-Men group. This is the Wolverine group. This is the Avengers and then this is the Spider-Man group and then this is an alternate universe and that's an alternate universe kind of floating off to the side. This is a similar graph with creators and again it neatly spreads out into a series of what we call the comic ages, the golden age, the silver age, et cetera. So we have sort of the golden age up here, the Stanley, Jack Kirby era here, the sort of 70s, 80s and 90s and then this kind of big blob is the modern age of comics where you get all the creators that are kind of working today and it's larger more because we have honestly better data on that period as we get, we're tied into on your production systems. And finally, this is a graph of our movies and let's check my time here. So I'm actually gonna zoom in on this a little bit, if we'll see if this works. So what we did for this was each kind of large circle is a movie and the circles are just, the nodes are sized by degree so the number of outbound connections and what's really interesting is that you get two, I have lost it here, sorry. You get two strongly or two connected components, one which corresponds to the X-Men franchises, the Punisher, the Marvel Cinematic Universe, the Avengers franchise, all of that and then so we get into say Thor here, the Avengers, we have Captain America up to the Fantastic Four, Captain America of course being connected by Chris Evans, we have the Iron Man trilogy, which is connected to the Daredevil trilogy by John Favreau and then various things which tend us to the Blade trilogy which then connects back to the X-Men, or sorry, to the Ghost Rider and then kind of down here is the X-Men and then off to the side by itself is the Spider-Man franchise which is a completely distinct connected component of the graph, back up, okay. So we did realize we left something out when we made this that caused it to disconnect so I'll just shout out, does anyone know what we left out that will connect the graph together? Stanley, so when you add Stanley in he becomes, A, he becomes the chief centrality thing on the graph which is interesting but he actually brings the two disparate, all the disparate franchises together so we can't get rid of them. So I think I've got, I think I'm basically out of time so thanks, hope this was really interesting. If you're interested in comics ontologies and various stuff we have a schema.org proposal right now with the W3C. So if you're interested in really deep comic arcana you can check that out. Thanks a lot.