 Good morning and welcome to this week's edition of Encompass Live. I am your host, Krista Burns, here at the Nebraska Library Commission. Encompass Live is a commission's weekly online event where we cover anything that may be of interest to librarians, any activities, topics that may be of interest to librarians in Nebraska and across the whole country, actually, now. We do these sessions, these are free sessions, free, last about an hour every, and we do them live every Wednesday morning at 10 a.m. Central Time. They are recorded, though, so you can go to our website and listen to any of our recordings of our shows if you want to, if you're not able to join us on Wednesday mornings. And we don't make sure if things are presentations, book reviews, mini-training sessions, interviews. Just anything, if it's library related, we'll put it on the show. Once a month, we do a tech talk where we bring on Michael Sowers, who is the Technology Innovation Librarian here at the Library Commission. Good morning. He's sitting here next to me. And he shares some tech news of them over the last month since last month he was here and he sometimes brings on other people to interview, talk with, whatever. And as you can see, we have that today. That's not Michael up there that you're looking at at the moment. But I will hand over to Michael and let you take over and introduce what you're doing today. Well, good morning, everybody. Like Krista said, my name is Michael Sowers, and I do the tech talk episodes here. And even once a month, I've got to say, I need to give Krista kudos because she finds pretty much somebody to be on this show every week, for all the other weeks, three out of the four weeks, or sometimes four out of the five weeks a month. And sometimes I have trouble finding people ahead of time just for my one episode a month. But about two weeks ago, when a whole bunch of travels were done, I realized I didn't have anybody for this month. And suddenly I got a new follower on Twitter, and I was like, hey, I recognize that guy's name. And I remembered watching his presentation at computers and libraries this spring, and I thought that it would be a really great topic to share with our audience. And little did I realize that he's actually got some new projects going on that I think he's going to talk to us about today. So Aaron, that is you we see there on the screen. Good morning. Good morning. And you are with booklamp.org, and I'm sure we will get to that shortly. But why don't you start out by just kind of telling us a little bit about yourself, your background, where you're coming from. Sure. I always got a tough question, actually. So my name's Aaron, obviously. My background is what's called industrial organizational psychology. But my passion is growing up with that I want to be a writer and author. A lot of what we're going to talk about is actually the result of a childhood dream at 14 years old kind of combined with computers over time. And about 2007 or so founded a company called booklamp.org, and also we're known as the Book Genome Project. And we've been working pretty diligently since then, out of New York, Boise, and then we have some people down in, well, they're moving around a bit, but down in California. Okay. So what exactly is booklamp and the Book Genome Project? All right. So, yeah, it's interesting, by the way. I'll leave this off by saying that I'm a better conversationalist than a presenter. So what I'm really hoping happens is that about five minutes into this, anything I have to say that's pre-planned will kind of be derailed by some sort of interesting question or conversation. The booklamp is actually kind of the front end of what we call the Book Genome Project. And so, while we actually were founded, the Book Genome Project was originally conceived of back in 2003, but the analogy works pretty well, kind of the Pandora.com for books. And the analogy connection, and I like the book genome logo, the analogy connection is that like Pandora, which pays attention to the individual components and metrics inside of a piece of music, we do the same thing with books. The difference is that we use computers. So for example, when we look at a book, we'll take and break it up into scenes or chapters. And then we'll look at each individual scene or chapter for a combination of things like writing style and enigmatic metrics. So for example, when we look at a book, nutritional bisect categories are very binary. They tend to say, well, this book is either about vampires or is not about vampires. We tend to be more granular. We can say a book might be 5% of vampires or 15% of vampires or 50% of vampires. And those are very three different types of vampire books. We also pay attention to more subtle themes. So a vampire book is 5% forests and is very different than 5% urban environments or 5% castles and medieval environments. And then we also look at stylistic. So is the rise and fall throughout the book, things like pacing and density and descriptions levels, dialogue levels, things like this. And then at the end of the day, we try to pull that together into a coherent statement that says, if I like this book, I would be interested in finding another book like it. Okay. That a reasonable description to start off with. Sure. Yeah. That sounds good. Okay. So you're analyzing these books. I guess my first question for you would be, where are you getting the data, the raw, the texts? What's your source? Where is it? The truth is, we've worked with publishers. So about a year ago, we launched bookland.org, which is kind of the technology administration of the tool set. And then we started to go out and connect with various different publishers. So we have somewhere between 60 and 100,000 books that focus, depending on which books we can make public based on our relationships at any given moment. Actually I'll go to one of the slides here, kind of an excellent, it was a good example that we use a lot for describing kind of what we do. So this is a screenshot of the bookland.org website. And put the book, the DNA from a book that is fairly well known as prevention code. And so when we look at the DaVinci Code, we don't see a book written by Dan Brown that's a best seller, that's well reviewed. What we see is a book that has language structure that is third person with fairly typical writing style metrics, a little bit slightly higher pacing in those books. But then in the story DNA, we look at themes. It's made up of some percentage history, academic culture, some percentage of Catholic institutions, religious hierarchy, communications technology, art and art galleries, secrets and secret keeping. So it gives you an idea that at the book level, what we're trying to do is say, okay, somebody handed the DaVinci Code to us. And so I like the DaVinci Code. We do not try to measure in any way qualitative for the bad. I think that'd be a very risky thing for computers to try to do. But quantitative things like how much is this book about each individual theme. If we can find a book in our database that is written with a similar writing style, a similar thematic, I mean, stylistic thumbprint, and has a similar distribution themes. The chances of being relevant to the reader to at least look at are pretty decent. And kind of because what I'm going to do is I'm going to throw things out there and see what seems interesting to people to talk more about. Part of the reason we do this is because there is a metadata, we call it the story, which is that there are lots and lots of universe and the vast range of words that don't have enough reviews, don't have enough metadata around the ability to adequately classify them. And so it can be invisible in the social network. Some of the books that I grew up with as a child that were my favorites for as a child, now currently, if you look at the social scary systems, don't have enough votes to be qualified as being recommended very well. And so this is a problem that's growing, but the advantage is that we have as much information about Harry Potter as we do about an author that's published their first book yesterday, is now a statistic or a book that's five or six years old and never had a good marketing. So, I'm trying to phrase my question here. I mean, there's so much here and I've got all these questions and I'm finding this really cool. But if I may focus just a moment. So you're saying that with the publishers, they give you a dump of the text? Is that, I mean, maybe a little more on the back end. How does this actually work? Let's get a little technical for a few minutes. Sure. Okay. So, start off with, yes, our basic primer for the system, I guess, would be the text of the book, right? In fact, we're actually kind of a component of an oddly odd perspective in the publishing industry today, which is that one of the things that's usually influential to how well a book performs in the market or how well somebody likes a book has to do with the actual written language and thematics and content of the book by the, you know, put there by the author. And while I say that it seems like an oddly, an odd perspective, it's only context that for some reason, a lot of times in publishing nowadays, you will find people have been very jaded that what drives the sales of a foreign soapbook is terribly based on how the marketing budget is, how well-known the author is before it's published, and so forth. And so we start with the text and basically say what we're trying to do is connect, you know, this book will introduce another book, it's print. From a technical standpoint, what we'll do is, yeah, we'll approach the publisher and the valuable position for them is really pretty straightforward, right? We don't offer positive or negative reviews with book. We don't bias against or for a book. We just simply say if your book is in our system and somebody is looking for themes that are like your book, we will surface it recommended to them. And we're very backless friend because we're not subject to the pyramid structure that a lot of the metadata concerns are. And so we're basically an alternative discovery mechanism that helps promote a different genre or demographic of book than most publishers have any time doing. From a technical execution standpoint, so I'll talk a little bit, for example, around like, you know, density is a language complexity measure, right? Because what we quickly found out is while my background is in writing in English as well as psychology, quickly find out that the definitions that people use in lingual discussions are very quite a bit diverse and diverse. And so we kind of just find our own operational definition of density and pacing as we thought it best reflected that it's useful for a reader. And the reason the complexity to this, by the way, is that you not only have to come up with measuring the book, but then you have to figure out how to take all this data. I think it's 30 to 1,000 points of data we measure per book. And then boil it back up to a way to a measurement that is actually useful for somebody or somebody who's actually looking for a book and connect it to a real thing. So density, for example, is a complexity measure for language. It deals with the obvious things like compound sentences, compound vocabulary, breath. But there's also a lot of elements in there that are difficult on the surface for a human to look at and say, I can see why that has to do with density. The reason why is because the way that we define each one of our metrics is we define what we're looking for, and then we have humans go through and identify what we consider to be simpler, good versions of whatever we're looking for. So a really good set of high density scenes and a whole bunch of low density scenes. And then we use machine learning and look at the actual strength and structure of the language to see what are the consistencies between one group and different with the other. What it basically is trying to do is say, if you're taking a computer and trying to guess what it thinks a human would think, that if a human had read this scene, this chapter, and had perfect knowledge of all the millions of other scenes we have in the book, the average human would probably say that this scene is more dense than 85% of the scenes in the corpus and less dense than the remainders. And that's what 85% does to people. Is that your question? Sure, yeah. No, no, that's good. And it's leading me to kind of a another related question. And it sounds like I'm focused on publishers that I don't mean to because I know we'll get to lots of other things. But I was reading an article recently about how with the advent of ebooks, publishers are getting a lot more data about how and what we read. And it's starting to influence what they publish in the example being that we tend to or people tend to read nonfiction in short chunks. So they're starting to publish more electronic nonfiction in smaller doses as opposed to really long books. Do you see obviously the publishers would get something out of your system along the lines of helping to sell books in that people who like this book might like this book. But do you see possibly the amount of data you're gathering from the text maybe influencing the publishers in other ways? Well, yes, you can know. So if you're right, there is kind of a shift that's going on in publishing. And we spend the very first portions of our company the first two years at least, working very hard to figure out how to how to work with publishers with their data in such a way that they felt comfortable and were willing to kind of move forward with a relationship with us. And it's been a lot easier in the last few years partly I think because we've been around longer but partly because publishing industry in general I think is starting to understand how powerful and useful this kind of data can be. I certainly hope that we are able to improve on the way things are done now in publishing. For example, I've had numerous editors tell me that it is not unreasonable to acquire, purchase, acquire, edit, and then publish a book which they think will sell 1,000 or 2,000 copies if they know not to print 10,000 copies and if they know if the cost of finding those 1,000 readers is not exorbitant. The problem is that we don't have enough information to easily find those 1,000 readers. Let me pull up an example of this. This is an infographic that we kind of threw together at one point in time because there's information that we can drive from the book database we have that while they seem like average easy questions are really very difficult. When you ask somebody in the publishing industry how long is the average book, traditionally what they'll tell you is 100,000 words. Now you're talking about short form. If you look at what materials are being published today by the publishing publishers, they're correct. The average book is 100,000 words. But if you use that and apply it across all genres, it's a very poor measure. For example, what is interesting is that the average romance novel in our corpus is about 76,000 words long. The average historical fiction is 117,000 words long. But this one down here I think is more interesting too. What is the most common perspective? Is it a first person book or a third person book? You can do some more things by past versus present tense and so forth. What I think is interesting here is romance because what this is basically saying is that 90% of the romance novels published on the market generally today are third person books. He said, she said. Only about 10% of the first person books is a pretty heavy bias. In fact, it's a bias for his third person and every category except for biography, not a biography, where the first person is more preppled. Now, in terms of your question about how, what does that mean in terms of finding, of using the information to help find and publish materials that is able to find the right readers? If you are a romance reader who likes long form, meaning more than 100,000 word romance is written the first person, you will have an extremely difficult time finding that book. If I'm an editor who gets a manuscript coming in the door and I love the book, I think it's a fantastic book, storyline and plot and all the things that I judge a book on are good, but it's written in first person and it's 120,000 or 30,000 words long. I'm going to have an uphill battle to try to convince my publisher to actually publish that book because the market for it is not really well established. It just means it's against the screen of what the typical romance genre has been historically. But if I can help you find the 1,000 readers that really, really like long first person romance, that's something that they trend to. Then you can open the door to a whole category of books that otherwise wouldn't have been publishable, that now might be, because you don't have to, you can kind of actually texture how many books should be in the run or if you're talking about digital formats, not a problem. But also the cost of finding those people might not be as high as it used to be. And so I think it's actually important to compare that to what's done currently, which is that, you know, 50% of the grade becomes a wild success and suddenly the amount of sexual content and romance all climbs because the whole flood of books that kind of fit into the same category. So I'm hoping that we can actually help diversify the market by helping people with very specific tastes find the publishers publishing books that are very specific. Okay, here's potentially an easy one or two. How many titles do you actually have in your database right now? And are there any, and if you don't want to name names, that's fine. Are there any publishers or books you wish you had in there that you just, you haven't been able to get a hold of yet? I won't name names. So we have between, like I said, depends on the timing, but we have between about 60,000 and 100,000 at any given point in time. And that's for most of the major public, and we're working with content for most of the major publishers at the moment. There are always, there's a question two ways actually. So there are always publishers that we're looking forward to working with more, and there's also publishers that we wish we were serving better. For example, one of the areas that I think we can be the most helpful for it, because we're a neutral platform for discovery, meaning that we don't have a nice birth marketing budget. A lot of the independent smaller publishers who are having troubles competing on the market with larger entities, are finding, you know, we can be friendly to them, but we don't have a mechanism right at the moment that allow them easy access. We don't, you know, our pipeline is in terms of how we interact with publishers, how we make friends, how we bring their content and system is slower than we'd like. And so as we go farther on, I'd like to better help them serve the publishing community by broadening our spectrum. Now let me give you an example of what I mean when I talk about Friendly to the Backlist. So my favorite, my favorite story for this is one I've used a long time. Are you familiar with the name Richard Bachman as an author? Oh, yes, I am. The name of Stephen King, right? So the story, as I remember it from reading, I think it was the introduction to the Bachman books or whatnot that he was published was that at some point in time Stephen King's friends were telling him that they didn't think in today's publishing world he could free break into the market. Stephen King being Stephen King would be kept out because how much competition there was or whatever the metrics were, right? And so he decided we want to test this. And so it took some of his old books that had never been published and he polished them up and he published them again under the nickname by the pseudonym of Richard Bachman. And he wanted to see whether or not Richard Bachman could free break into the industry. And as the experiment goes, if I remember correctly, there is a, he does, he posts about five books, I think, under this name and they do okay. By his fifth book I think he's selling 31,000 copies. And he said, I remember you want to record saying that he was doing better as Richard Bachman by his fifth book that he had done as Stephen King by his fifth book and that he thought he was on the right trajectory. And someone along the line somebody found out and it became public. And then the Richard Bachman books went from respectable sellers for an independent to Stephen King without that seller, right? Now, so one of the tests that we had is when we were building out the system is if an author, if a reader said I like Stephen King, the best recommendation possibly given to a Stephen King fan would be Richard Bachman, right? Prior to everybody knowing he was Stephen King. It is a very, very comparable author, right? So one of the tests was whether or not with our system we would find those connections and one of the big wins for us was after we built the initial system we had to get that point about 70,000 titles of the system and we went through and I looked at Thinner by Stephen King slash Richard Bachman and was very pleased to see that I think four of the top 10 books that came back out of about 70,000 were Stephen King books which means that that even though Richard Bachman was not as popular an author as Stephen King but the day that that book reached the market, like as of the day or even two weeks before the release the day it reached the market we could have let our Stephen King fans know that there was a new author that was in the market that might be appealing to them like Stephen King and it didn't make a difference whether or not Richard Bachman as a new author had $100,000 in the marketing budget didn't make a difference whether or not he had a lot of metadata surrounding his own name it was a way to pull out something from the backlist fundamentally that would be appealing to somebody who also liked the product so that's why we talk about it being very neutral very friendly to the backlist it doesn't buy us in that way yeah, that's a great start you know, if you pull up Stephen King will I recommend Richard Bachman the other way around I haven't, I mean actually so this is a double-edged story, Stephen King is actually an extremely diverse writer there are writers that are not diverse and as a Romance author he's very, very consistent in the way she writes Stephen King actually if you look at his catalog of books some books will be very similar to others and some books will be very, very different It or The Stand is a very, very different book from Dolores Playworn or Kujo and so yes, you will find that books recommend each other but but they will cluster in different ways you won't find all the Stephen King books or putting all the Stephen King books you actually will find it recommend each other sure if you're pulling that, I think it's actually and I noticed this morning that there was a metadata error for some reason in the system this morning the author attribution of Stephen King it's a little funky, it'll have to be fixing that so I better get on that so now maybe switching a little bit towards the how a librarian end of this librarians could use this as opposed to the publishers especially looking at the charts that you have up here now the majority of it is genres in fiction obviously but then you have like biography and autobiography is your database skewed heavily towards fiction or not would be kind of the first part of the question the second part of the question is do you find it harder or easier to deal with fiction over nonfiction or the other way around in this sort of system no and so actually I'll reference some other things as well I'll come to that answer your question first, no it was originally built from a fiction standpoint and the reason is pretty straightforward if you look at nonfiction nonfiction is very well built for your style analysis so if I'm trying to find a book on programming and pro-programming language a keyword search is probably going to help you find it the problem is is that a lot of those style metrics are not very well structured for fiction so if I'm using Stephen King's example I'm trying to find a book like Stephen King if a keyword do I search for it's a tough question like monster or what you end up doing is searching by genre almost more than you start to search by the specifics of the book so we built it, kind of stylistic and subtle theme analysis for book discovery was really initially intended because it's fillable in fiction that said we found it worked very well for nonfiction as well to the extent that if you look at cookbooks that are French cooking what you'll find is you'll get back other books that are French cookbooks because the common references of thematic ingredients that tend to be used are very similar to preparation methodologies if you look at a cooking book that intermixes story with the recipes which is fairly common you will find other books that intermix stories and recipes because the other, the additional themes that tend to show up in the story elements of the book will draw those two books closer together than other two books that don't have stories right so no it's not intended for one or the other in fact one of my favorite things to do is the switch you can do where you can say I want to see only nonfiction or only fiction and it's the only division by the way we make in the actual in the actual display of the book like when you say I like this book you can find one like it we do not pay attention to the official bicep or genre at all we simply say if it contains an equal amount of magic an equal amount of vampires and put them together that probably it's more if you're looking at even if it's not labeled as officially the same genre that you have where you traditionally read that said what we do separate out is fiction versus nonfiction so we will if you search for a fiction book it will turn into a fiction book but you can turn that off and so for example I remember I was reading a book it had a lot to do with military campaigns military it was a fictional book about combat spaces and this sort and which is there is no nonfiction book about combat there's very few nonfiction books about combat space but it had themes that were military and so forth and so it was curious it's okay I'd like to find all the nonfiction books that have themes like this fiction book that I read and what I came back with were nonfiction books about training in military you know it was about special forces team and so came back with current day special forces books and so it's like everybody kind of switched I remember this fictional piece of this I wonder if there's information out there from nonfiction perspective on the same topics so you can kind of bounce around from one genre the other great so this chart that you're showing right now is the one I remember from your talk at CIL explain this just because mainly because I love this chart and do you have this for like every book I mean that's kind of the follow up we do right so every single book we analyze we generate I think I've mentioned I think it's 32,000 plus top points of data for every book and what that means is that every single scene we look at we measure a whole bunch of metrics times every single scene in a book and and so it's it's one thing to look at the at the at the high level magic magic metrics of book and say okay I can see that that's secured my couch and religious hierarchy in it but the other thing is is realized that computer you can know as much information about the very first chapter of books as the second chapter, third chapter, the 27th chapter so there's two things you can do that one is stylistically you can do things like major writing styles so for example what you're looking at here is Jurassic Park from my own writing and also one it's also one that we use a lot as an example people didn't be familiar with either the book or the movie and this this graph is actually in the back of my business card say I love it Jurassic Park when I was younger used to be one of my all-time favorite books and you can tell very quickly kind of my bias towards science fiction fantasy and then business and so forth books my reading habits the but I'm referring to my friends when I was younger I used to say you know you have to read Jurassic Park you have to read at least halfway through because somewhere in there like most action-packed books you've ever read and and then so when we were building the system the question is whether we'd see that transition so what you're seeing here is the density and pacing graphs for Jurassic Park from beginning of the end to the end of the book and what you're seeing is at the beginning when he's talking about he talks about you know the science of the phone and he talks about security systems of the island right there's a lot of setup here and this is a fairly typical science fiction profile meaning that the density level is fairly high the pacing level is fairly low and so forth and then the dinosaurs escape the fences get turned off and the dinosaurs start eating people and all that jazz and what you see is that Michael Crichton in his writing actually shifts there what happens is the density falls so the language structure becomes less complex pacing goes up and it stays that way throughout now this is a very typical action adventure profile and in a way what Michael Crichton did is he took a science fiction profile at the beginning and merged it with an action adventure and you can see how an action an item in the character or a plot event in the story influenced the way that the architecture of the story so a few other things, this is the one this is very Michael Crichton-esque you will see this kind of a pattern with other books but to have it last 45% of the book is a long time Michael Crichton gets away by doing that much much more than most suspense authors the other thing is is that you see the same pattern you see the same pattern with dialogue dialogue falls off here the funny way of saying that is that after people stop talking as much after they get eaten by a dinosaur that's probably not actually accurate what it really is is that dialogue a lot of times is used as a pacing device you move pace forward very quickly you move the plot along you have to tell a plot and so here is probably a lot of dialogue setting scene and explaining a lot and after the action of ventric characteristics begin the need for dialogue is pouring structure of the story less and so it falls off as well so now I'm wondering what the pacing looks like for a cookbook so I don't have I don't have the ability to pull up I mean you do have this data but I don't have a graphing tool built for it but I will show you something else that's interesting so the other thing you can do because you can measure on a scene by scene basis for interactive themes so you can say I want to know where the werewolf is versus where the vampire is versus where the werewolf and the vampire exists together which gets me and moving towards something else that I want to talk about today that we're working on but what you're seeing here is a sexual content graph for the PTA of the gray which I'll assume most of everybody is familiar with and what this is basically is again beginning in 1000 word and increments going to the book and scoring the book for its amount of sexual content so she says gray actually goes for a long time with very very little sexual content and you can graph out and see exactly where it's taking place in other versions for comparisons this is a sexual content graph of the penthouse which I mean speaks for itself what's more interesting to me is this one right here which is technically an erotica book but it was published as an erotica book in 1906 so the definition of erotica in 1906 was a very different definition of erotica in modern contemporary times you can see the change now by the way just so you know what this kind of information is useful for actually is a number of things but primarily what we use it for is metadata verification right does this this book is published has metadata in it somewhere that says it's a juvenile fiction book but yet it looks like 50 shades of gray the possible of this book has an error in our metadata right you definitely don't want to be recommending this book to a child because it's that content that's probably not appropriate to it maybe we just got a data somewhere and it lines up exactly so I'm afraid to ask on these last three charts some of the areas have little stars in them what do the stars mean stars mean if you were a quality control person and you're trying to figure out does the metadata of this book line up with the genre so let's pretend that it's not 50 shades of gray which we all know pretend that it's not a children's novel which means that any basic sexual content would probably play for review let's say it's a mainstream book that looks very much like it has a profile that matches erotic so you're saying okay I don't know if we accidentally screwed up the bisect label on this coming in what you want to be able to do is check the three most prominent sexual content scenes in the book basically it's extremes and so that's what the stars measure in internal systems you can link the phrase you click there and then pull up that picture we've seen now it's also interesting about this we think about other ways you can use it that are more broad we can do a sort of analysis on any of our things so if you want to see where the dragons show up in the book or where the vampires show up in the book or where the cooking shows up in the book you can graph it out with a similar sort of methodology also and then you can put the stars where you have vampires who are fighting right or space exploration and has combat or whatever it might be my references tend to be genre specific and so I don't want to leave you with the impression this is only a genre based tool it's just that my reading habits and my frame of reference tends to come from a genre background I can say I used to I've been writing Robert Jordan courses that part that people I grew up reading so gotcha so one more question here about a book lamp and then we'll move on to your new project so there are other systems that do the if you've read this you might want to read that and we won't name names why should we use why would a librarian maybe want to use your system instead well first off we do not cleanly perfect I think it's really important to realize that what we're talking about is a different way of critical discovery that is complimentary not competitive to social recommendations there are things social recommendations engines do very very well there's enough information and there's things that social information recommendations do very poorly when there's not enough information so we kind of I see the future of recommendations being a hybrid where it's laying the strengths of one system with the strengths together case in point you know so if you look at the Goodreads which I really like I very much enjoy Goodreads it's a fantastic site or Amazon's people who bought this bought that I don't see reason not to name names I think they're very interesting systems what you find is that books that are popular are popular and we also forget how large the social void is and again the social void for my definition any book or item that doesn't have enough social data around it to be recommended so if you look at Goodread recommendations and I've done this with my own site and I'm not I think Goodreads is fantastic so I don't want to use it as a I use it as an example just because it represents one of the largest book dedicated systems out there not because I think it has major flaws but it represents what problems do exist in social discovery and so if you a great way to test and get an idea of how large the social void is is to look at your own recommendations on Goodreads so I don't have any reason if my experiences are different from the other typical readers I've gone in put lots of books on Goodreads and I went back and looked at my recommendations another piece of recommendation what I pay attention to is the number of votes each book in my recommendation list has and I went through and I counted how many books in my profile had more than 100 votes at least and I was curious if I went down I recorded 800 recommendations and I counted exactly how many of them were and what I found is roughly 95 to 96% of the books that were being recommended had 100 votes or more which means that there were a lot of metadata with that to me is that the recommendation engine that Goodreads is using is probably looking at and saying why don't we give the best recommendation if we have at least enough information represented somewhat comparably at 100 votes or more they might not be using the exact metric but if your book has 0 votes we probably have a very difficult time recommending it because we just don't know enough about it it could be a good book, bad book, we don't know in comparison if you actually go look at the content on Goodreads depending on how you estimate it it's likely that probably in the range of 95 to 99% of the books with records in their system have less than 100 votes which means that those books are, I speculate but I believe those books are largely invisible and whether those numbers are exactly the spot or so lower than that it's in there somewhere so at some point in time a book just does not have enough metadata and it's a much larger category of books than people really so the question is when you have a 500,000 votes reviews on Harry Potter you have plenty of data to figure out whether or not somebody likes it. The other book that's got 500,000 votes is plenty of data to be able to cross-compare and say would somebody like Harry Potter like this new book what lacks is if you have 100,000 votes about Harry Potter and you have 0 votes about another book would be a good match for a reader who likes Harry Potter we just have no judgment call we just have no way of pulling it out and so at that point what can we do to help make those authors discoverable if you are Richard Bachman how do we make you discoverable to find the Stephen King because Richard Bachman would have been a low metadata area especially his early books would not have had 100 votes on goodreads so it's exploratory and there's a number of things you can do also with when you look at a book and you can say okay this reader tends to be reading books of these specific sub-genres or sub-themes inside of a book it gives you a very interesting unique profile to them so the strength and the weakness of a book plan is that as long as a book and as long as we have a single reader the system is equally effective if we have or the book if you have a single reader is as effective as 100,000 we don't have a bold start problem we need to gather a whole bunch of data about every single book and in fact we can gather data about a book once we have a text file much much faster than the social recommendations and so really you have to kind of look at the pros and cons of our system and you know in some areas we do very well in some areas we do very poorly we don't have the book in our system we do very poorly we also cannot recommend a book if we have no data on it and and sometimes we do fantastic with Richard Bachman to Stephen King match or the Dan Simmons Curian matches but sometimes also we get really weird ones I mean we call those areas where I look at the thematics and I don't know what the end is doing it's just a train of books and I look at it like you know that just doesn't make sense this book and that aren't really very close and so another example another I think is interesting so if you look at the I don't know if it's still this way but it used to be if you look at the shared bookshelf methodologies on library things and look at library things and look at the books that were similar to Harry Potter or no to memoirs of vacation what you find comes back as a comparable meaning inside of as being shared on a lot of people's shelves is the DaVinci code and a Harry Potter title and at first when you see that you're like why doesn't it make any sense those aren't anything alike except that all three of them are released into the movie theater in the same summer and so the marketing budgets of the movie releases drove people to buy those three books all together at the same summer and so they show up on a lot of people's shelves together at the same time it's not necessarily a bad recommendation but it's a pretty clear example of marketing but it's outside of the quality of the content of the book is driving why these are put together on the shelf wow and the moment you said those three books I said movies so that was good we just want to remind everybody that we're welcome to take questions from the audience just go ahead and type them into the Q&A area or say hey turn on your mic so when I first asked Aaron to do this show this is what I was thinking we would talk about but then he said but hey I've got this other project that's coming up very shortly called the game of books I wonder where you got that from so why don't you tell us about that what's going on there sure so first I want to make an appeal because it's easy to listen to me talk about data behind books and make the assumption that that that what I see when I see a book is data that's not quite true because I very much grew up in the reading environment the reason where this all came from was when I was younger I believe I was about 16 years old and I wanted to be a writer what I would do was write short stories given to my father and I would for editing and I would require him makes you indication of how weird of a child I was I would make him score every page on a scale of 1 to 10 and how interested he was in that page and the idea was that I'd graph it and I'd go through and if a page fell below a 7 out of 10 that was the page I had to edit that's where the person gets up and goes and makes a sandwich but very much my interest in things that we do are come from the fact that I've lived in brief books growing up my very first job was in a library when I was 12 years old I used to clean it and look at the library and they gave me a key and I'd go in at 9 o'clock at night and I'd read for an hour, I'd clean for an hour I'd read for an hour, I'd clean for an hour and I grew up with this and so when I look around a lot and try to say what can we do with this sort of data are there authors we can help we don't are having troubles finding the market people to read their books one of the areas that I haven't seen or want to try to work in is libraries I have a very strong opinion of libraries the second story on that shows how unpopular and not dateable I was as a child was that on my 16th birthday my birthday party was that my parents drive me from the small town that I lived in to the larger city that was nearby so I could spend all day in a bigger library that was my 16th birthday party and that is spectacular yes well it's you can imagine what my high school life was like from that story alone right so the the this idea of libraries is something a mission that I really really deeply care about and we always are looking for ways of helping people to understand how our data can be useful or helpful or informative a way of thinking about books in a slightly different way in this kind of process in a different way and we're trying to figure out how to do this and what we came up with is a very preliminary graphic or whatnot but we're preliminary and we're calling the game of books and the appeal I want to make is that I wrote up here and you can't see it because I'm shy up here but my Twitter name is Aaron Stanton at Aaron Stanton is my Twitter Facebook.com we are very much interested in connecting into the library of communication I have a fairly good relationship with publishers and authors but we don't have as much of a connection with librarians which is sad to me but also because the game of books is specifically intended to help libraries engage readers now the basic premise is very similar to summer reading the library is already there where students can come in and have to read a certain number of books at the end of the summer they might earn a reward what we wanted to do was figure out a way that you could expand back a little bit the way it got started is very simple actually playing kind of cheesy we were sitting around the office and we were playing with graphs and just like the sexual content graphs we were like well let's put little swords and cross swords for combat scenes this spot in the graph and it's a little hearts for romance and it was kind of fun but not very useful and also there's a danger of giving away elements of the book with these graphs which is why we don't really publish them on the website very much but what we ended up doing is something along the lines, something like you know what would be fun is if a book could earn a batch for having a really unique combination of books the first one we came up with was called the clunky but cruising batch it's a book that has a suit of armor and a modern day car together in the same seat which is just a great it's totally irrelevant but it's a rare combination of events if they have a weird set of combination events in order to get that show up in a book and so then we just kind of went haywire we have this badge right here is tough love which is a romance novel with a density score of higher than 75% I think this is the armchair detective badge which is I'm trying to remember the police detective badge so books that have high themes police detective and some other combination and so the idea is very simple the first part of the idea is very simple is that if you read a book that contains a unique badge you as a reader should be able to earn that badge and how valuable the points you earn for that badge is determined by how rare the badge is so for example the rarest badge is a called the nerdy vampires badge which is a combination of vampires and high presence with science, technology and astronomy and there's only like four books in the entire corpus that earn it and one of them is a book about vampires on a space station NASA space station somewhere which I know about solely because it shows up and the idea is to help discover it helps provide a reason to discover your book a book that's different than you typically read so it was built around basically these journey concepts that you might have a science fiction journey or the romance journey and you complete a journey by earning a set number of badges and so it might be you have to read the underwater cities badge and the time travel badge or so forth or the space exploration badge but the point is that you complete the journey turn levels and you earn points finally the part that I actually really really interesting about it is that that because we want every single book in the system to be able to play but we want bad just to be fairly rare and special what we also do is we call reading experience when it's a reader XP and what it is if you read a book that has a specific theme it's just based on how much that theme is present so if you read a book with 15% vampires you might re-earn 15 vampire points right what happens is you add these together you start finding you might become a level 5 vampire reader because those are themes you read over and over again across multiple books so for example I read most of Tom's fancy books you just go put those in I think you're in like a level 7 terrorism security reader and a level 5 a military conflict reader and the idea is that it's two fold one is it allows me to be able to very quickly compare with my friends what level 2 from a reader by various standpoint it's very interesting because I can go and say listen I like fantasy but then I can show you my profile and you very quickly get a sense of whether or not this is a sort of sorcery fantasy reader or is this a magical creatures unicorns fantasy reader you know what category of books sister to person to gravitate for and also what books in the system have those themes and so it's kind of a discovery method now the very final component of this is that one of the tools for playing it I think it's fun is going to be an iPhone app or Android app and basically what you do is you sign up for a journey and then you can scan the barcode on the back of the book or search for it and find it and basically what it will do is it will show you the game cards of the book what badges it would earn, what points you would earn if you if you were to read it and it creates almost a scouting or hunt style where you need to go in and both interact with the digital but also have a reason to go in and interact with the physical library stack as well and there's a whole bunch of questions about how this sort of stuff can integrate on the catalog level and so forth but we're very much interested in kind of collecting feedback and information from whether or not this would be useful for readers I mean like for readers specifically for parents or teachers or librarians who are kind of championing the reading mission if that makes sense so one question based on something you just said do you foresee some sort of way of integrating this into a library's catalog or did I mishear you well that's speculation a little bit I think it's talking about libraries without talking about the catalog we need to avoid a hole in your conversation but that said what we're doing and the reason we're trying to connect to the library community is we're trying to put together like basically an advisory committee of librarians teachers maybe even a game designer or two to help us make sure the world is appealing people and what we're trying to do is to work to make sure that the game has a number of criteria but the primary one being is that it's useful and it actually fits into the way the librarian will interact or a patron will interact with the library the other component is the library that I grew up with in Cascade, Idaho which had a population of 1,000 people was very small it was two large rooms lots of books and a computer in the corner that's as grand as back in 2099 so it probably has more computers now but it did not have a lot of financial resources and so it's very important to me that whatever we put together is something that can be played both by basically regardless of the financial resources of the library that Cascade library as long as there was access to the internet preferably an iPod that access the internet that could scan barcode or you could look up information about it that there's not really a lot of return costs it's really a framework it's a way of creating a universal framework that you can plug into but at the same time if some library really wanted into it and patrons really did respond to it well that you could also support a much larger game and create a reading amongst the community if that's the way we're supposed to go so the exact design is to deliver at this point not nailed down because we really want input from the library teaching character communities oh great thanks Erin we do have a question from the audience Theodora wants to know if you cover YA young adult fiction and I assume she means in this is it can you repeat that actually it was a little quiet that's okay no question from the audience Theodora wants to know do you cover YA young adult fiction so certainly the game we're building it because it's to be a broad appeal but I think it's very difficult to talk about gaming and redefining the reading discovery experience without talking about young adult fiction I think that's probably a place that most people will gravitate towards certainly teachers and parents will certainly gravitate towards young so yes that'll be a high priority in fact what we'll do is so to explain by the way what we're going to do is we have the construction framework we have the book we've analyzed and we know what books and what badges and so forth what we're now trying to say is what kind of response we're going to get from people who might or might not actually use it and so we're going to have a I'm not sure if you're familiar with kickstarter.com but it's a place where you can kind of propose a project idea and people in the community can support it by pitching in $10 to $15 or nothing just kind of boating and what we'd like to do is if it gets traction there is we'll tailor the game a bit towards so if that ends up being young adult we'll make sure that we have a very good comprehensive coverage of young adult fiction and nonfiction that covers so that there's a very good chance that when you're in the library, young adults in the library trying to find out what kind of books to read the vast majority of the books are there are already participants they already have game cards in the game so yes that'll probably start at the point but I would be very I would be disappointed with myself if we build a game that only appeals to young adults I would very much want a game that also can be played by people of my age that read books that might be involved in book clubs where the curriculum where the reading list is helped is probably derived you can see the book club you've earned this number of points and things like that alright so I think you kind of partially answered this question mentioning the forthcoming Kickstarter campaign but so okay I'll be honest I want in you know I'll beta test I'll do whatever as soon as you've got an android app so somebody is interested in helping playing doing whatever what should they do and when should they do it well so first off I get oddly enough for a company and a team that's about to launch Kickstarter project I feel very uncomfortable directly soliciting support but what I am comfortable soliciting is connections right so first off you can if you want to follow us on facebook like us on facebook what I'll do is connect you in so when you post on facebook any updates about what we're doing you'll be able to see them follow me on twitter certainly I'll be vocal when twitter happens when the Kickstarter launches to kind of actively go out and introduce myself to people strangers on the street if I have to and say hey would you check this out the other thing is that I'm quickly discovering that there is a community that I've neglected to my own aggregation a little bit in the community in the libraries community right so I'm going out and trying to introduce myself to different librarians you know getting to know people who are connecting connect to this I love by the way I just recently discovered people superheroes persona things going on in libraries the librarian in black the unquiet librarian these secret identities this is a fanta... I love this this is such a wonderful library-esque thing to me and what I'm discovering is that I'm really very disconnected from these communities and it's weird because I'm trying to be connected so you know let people know that we exist if you think what we're doing is interesting connect with me send me an email I'm happy to put my email address somewhere that's available or send me a you know follow me and send me a twitter message or what not anything we can do to in a non-annoying friendly and good hearted way of getting hold of and connecting with many people who might be interested in the project if possible it's our call so alright fair enough another question or comment no I was just going to say that yeah just so everyone knows as we do with all of our shows here we put all links for any URLs that are mentioned into the library commission's delicious account so I've added the booklamp.org site but I'm also adding direct links to the twitter and the facebook page as well so people can quickly jump to and connect with you guys and I just liked you on facebook so there we go although I don't live on facebook all that much so do we have any other questions from the audience questions or comments at the moment if you have any questions type them into your questions section we'll get them answered so Erin I guess I will just give you an opportunity here while we wait to see if we have any other questions from the audience to add anything else you'd like to add anything maybe I didn't ask you about you wanted to plug or tell people about not really plug or tell people about I'm looking at my graphics as well if you're interested I can kind of say this could be interesting stuff that we do sure so our time is almost out so do you want me to go ahead and continue or do you want me to go ahead this is great so again reflecting my fantasy reading behavior one of their books was the wheel of time series by Robert Jordan are you familiar with it by any chance yes of course and the fact that you just said while you were growing up made us feel old well the truth is the series has been going on for 14 years so everybody has read it while they're growing up that's just the nature even if you came to an after school you still grew up while it was being written I was working when the first one came out so that's why you made me feel old well I apologize that was not intentional that's alright I'll just leave it at that sorry one of the things that I think is very interesting about the wheel of time series and I think people who have read the series probably connected this or at least see where it comes from is that as you read the series the books change over time when it's being written over that period of time authors tend to evolve and Robert Jordan as he did got more complex and his character used to change so we also track things like what percentage of scenes each individual character appears and so forth and so this is just a graph of of the book by book metrics in sequential order of the wheel of time in the first 10 books of the wheel of time series what you can see is at the very beginning of the books they start in a 45 percent range and then they get consistently more complex now in density specifically a 5 point move on a scale is noticeable if you pick up a book it's 20 and if you pick up this 30 it's a big jump we also have 20 to 25 so if we go from a 45 percent density to a 75 density is a tremendous change in the style for the writer and I found this very interesting because because well it reflected my experience but the other question a lot of times is do you have some writing styles interesting but does writing styles really matter to the reader and this is a great example of saying yes because another thing that happens is if you look at the average user rating for Amazon star rating for books the series starts out fairly high rated and then drops the lowest rating I think was 1.7 out of 5 stars very very low rating for a really successful book and what you're seeing here is a statistically significant correlation between the pacing level of what Robert Jordan wrote and the score that the readers made when they said we like this you don't like this so what I was saying is that Robert Jordan fans were very very clearly to be read not read and so things like pacing, density and language structures tends to get overlooked in marketing campaigns and tends to get overlooked in discovery campaigns because somebody with thematic makeup connects to books they're both about spaceships but at the same time that's not the only entry way so I'm not sure if you're familiar with Nancy Pearl but Nancy Pearl out of Seattle has a philosophy that I like that is there are multiple ways the gateways into a book and every reader when they say they like a book can be referring to one several different reasons they like it it can be language, it can be what setting or plot, it can be the characters and and we all kind of figure out internal bias when somebody says I like Ender's Game by Orson Scott Card I don't think it's the same reason I do, I don't think it's the same truth so really the follow-up question has to be what did you like about the book and we try to eke that out by saying well there's three books that you think are similar that you like and then we can look at what the consistences are between well you think these things are similar and themes that are similar to these but also you tend to drift toward the pacing level of this range and the density level of this one and so can we use that information as well to help us find it for you a little bit so it was just another interesting example of I think how these things can interplay great, yeah when you brought up that the Wheel of Time charts there Kristen and I started asking about tracking deaths in Song of Ice and Fire by George R. R. Martin and things like that because characters just die but anyways so we're not seeing any other questions or comments from the audience Erin I want to thank you very much for doing this for us and with us especially on less than two weeks notice we know you're busy running a company and things like that so we appreciate it and Kristen we'll definitely be looking out for that game of books I'm going to get involved in it too and we'll share information about that when it is live you've got at least two people here who want to play so figure what you're going to do for everybody else who might be listening I'm serious I really would like feedback on it we are interested babes in the wood in an environment we think we're just out there trying to do fun things that we would enjoy and it's always hard to bring these it's always hard to convey our enthusiasm and our good nature to intense and then how we try to translate that into execution and so I really would if you have thoughts on it or perspectives please do send me an e-mail I think one of my earlier slides had an e-mail address but it's astanton.com and fire me an e-mail if you feel like it I would appreciate it even if it's just a one-liner saying I hate this or I love it or whatever please all right great Aaron once again thanks so very much we're going to go ahead and take control back for just a few minutes I've got a few news items that we want to cover here so give us just one sec to switch over and we're going to set this up okay I just have two links I want to talk about those of you who are regular listeners to tech talk I tend to talk about security issues just a bit well it's news I mean that's really what it comes down to one thing I won't get into too much over the last couple of weeks there have been some significant security issues with Java not Java script Java and I've even gotten questions at our recent tech planning summer camps about that and basically the big question at the moment is should I uninstall Java for my computer the general recommendation at the moment is unless you know you need it yeah you should probably uninstall it okay that being said some of us do need it how do you know if you need it well if you don't know you need it you probably don't yeah it's a hard one I specifically know I have a couple of programs that do rely on Java to do things but I use that program every single day and I know without Java it won't work I know here at the library commission they attempted that and it lasted for approximately 20 minutes maybe 17 before they realized oh wait sorry never mind I'll put it back because I think people did rely on it for certain things so obviously something you'll know pretty quickly on your public machines maybe what I would suggest is uninstall it from one or two see if people freak out you can always reinstall it that sort of thing however what you can also do if you don't want to uninstall it but you want to check to make sure you kind of have solved some of the immediate problems you can go to isjavaexploitable.com which I'm showing here and we'll put in the show notes and basically it will test the version of Java on your machine to see if you are up to date and if some of the known holes are plugged the problem is people are saying that there are even holes that haven't been plugged yet even if you are up to date this is why we're kind of leaning towards the uninstall unless you need it sort of situation and you can search Google for like Java exploits and things like that and you find tons of news a little more technical detail if you're interested in that the other thing that I just heard about yesterday is about security related and it's just very different Amazon as many of our listeners may know has been doing cloud storage for quarter one it costs it's reasonably priced but they've started a new service called Amazon Glacier and it's designed for people who need to or want to store stuff in the cloud but they don't need immediate access to it so like you complete a complete and total backup image of your computer and it's a big giant file you store it there and what they're saying is is that the retrieval time on this what you store there is like three to five hours you can't just log into a website and say give me my files back you have to log in and say I want my files back and they'll get back to you eventually but the point being if you look right here as little as one cent per gigabyte per month storage charge which is a phenomenally low dollar amount so I just want to put that out there as kind of a possible option if you know long term storage is something libraries and archives do this might be a cost effective place to put your archival content I'm just throwing it out there I literally learned about this yesterday so it's something I might actually start considering for my long term backups and things like that myself just kind of an interesting option I want to point people to so that's my news for the month at this point I've been on the road a lot so you kind of live in a news vacuum at that point but sure so Krista wants to come to this live page I will hand it back to you thank you very much Michael and Erin for joining us this week the show was recorded so it will be available later today maybe tomorrow by the end of the week for you to watch if you need to or to share with any of your colleagues who weren't able to join us this morning and I hope you'll join us next week let me go back to the main page oh I'm on the archive page I'm sorry yeah so this is the website so you can go there and see our archive sessions everything we have here on the website also please do like us on Facebook we do have a Facebook page where we do post any announcements of our shows when they are going to be happening when the recordings are up we'll all be posted in there so you can keep in touch with us there and I hope you'll join us next week when our topic is the 2012 one book one Nebraska title which is I'm a man Chief Standing Bears Journey for Justice that's the book that we're reading statewide in Nebraska for those one book one town type things, one book one Nebraska and we'll be talking about that next week the author of the book Joe Sterida will be here with us so that's exciting we'll have him actually here on end cup of slide with us and we'll be talking about the finalist for next year's one book one Nebraska you can see here and which will be announced on November 3rd at the celebration of Nebraska books here in Lincoln so we'll be talking about those ahead of time so you can maybe think ahead to what might be the book for next time so hope you'll join us for that next week other than that we are good to go I don't see any final questions or urgent issues so we'll wrap it up and say thank you very much and we'll see you next time bye bye