 Well, welcome. This is John MacArthur. I am here at Wikibon headquarters. I am John MacArthur and I am the moderator for today's peer insight. We are here today with Brigham Hyde, who's an adjunct professor at Tufts University and he's now managing director of relay technology management. Brigham has worked in drug development and investment banking and has now moved to managing director at relay. We're also joined by Sid Probstein, CTO of Attivia and Jeff Kelly, who is Wikibon's big data analyst. And we're here today to talk about combining unstructured and structured data for delivering big data business value. So, Brigham, let me kick it off with you and just ask you. You have an interesting background. You have your doctorate in pharmacology. You're working as an adjunct professor. You've done investment banking and now you're at relay. Tell us about that journey and a little bit about what you're doing. The idea of relay action in 2008 myself and the other co-founder David Greenwald who's a PhD in genetics and looking at drug development and the challenges in terms of data overload and analysis that they faced. We were both frustrated scientific entrepreneurs trying to get ideas out of the lab trying to understand a bit more about how the marketplace worked for those decisions and I think realized very quickly that the people making those decisions, high-risk decisions were very underserved in terms of access to data and meaningful insight from that data. And we set out to create relay kind of on that principle. And my experience in banking in the meantime confirmed a lot of that from the other side. So, you know, having to analyze those companies those drug development assets for instance and realizing that it was a highly qualitative process in some ways with a lot of information to pour over which was largely being done in a manual basis. I mean, I lived in the spreadsheet world, you know, and manually curated data for a long time. So really wanted to get at those issues and stepped into big data to do it. What are some of the kinds of data that the scientists didn't have access to that they need access to in order to make more informed decisions? Well, just to give you an example, let's say I'm looking at a phase two drug development asset and you think about the attributes of that asset that are important. Certainly commercial aspects such as, you know, transactional information about it or how much I paid for it or how much the market size is. That's all out there. I think the interesting thing is to then connect that to the scientific and clinical information. So while it's great to say I have an asset for colorectal cancer, it's another thing to understand the underlying science of that asset and the information that's out there that could be mined to determine is it more or less likely to work? Is the clinical data there to support affording of the asset and the regulatory agencies respond to it? And the connection of that data I think is really one of the crucial things Relay has done. It's enabled under the covers by the ability to unify many different types of data. So structured on structure we'll talk about that. But just simply finding a way to get this thing into one place and then ask it directed questions that made sense to users. Do you know what questions you want to ask? Or do the investors know what questions they want to ask? You know, it's interesting. I think we have a lot of engineering talent at Relay where the former CTO of Elsevier, Mark Krollenstein, a guy who was founded Northern Light 15 years ago and one of the early search companies. So we have a lot of engineering talent. I think what makes Relay unique is that we have folks like myself, people with experience in biology, pharmacology and also in business to understand what the question we're trying to ask of the data is. And I think that's why we've taken on the approach of developing a SaaS product because we actually know some of the questions and can put together those linked concepts. And I think that's maybe where we differ from some pure technology plays and why we're a bit unique. The users themselves that we interact with are business development folks. So this is maybe an MBA or a PhD at a high business level. They might know the information they want, but they don't really know where to get it from and are largely living in the world of being served by databases where you download a CSV and then have to turn information. They may go to 20 independent data sources internally and externally to get information. So there's no kind of unified process and no way to kind of connect the dots between those efforts. I mean, our product competes with manual data curation. There's no question, which boggles my mind sometimes, but that's the situation we find ourselves in. Brigham, so we're talking about sounds like more than just structured data, of course. This is a lot of unstructured content. It's documents and talk a little bit about the types of data assets you're talking. We're trying to connect here because it's one thing to kind of bring together disparate data sets if they're structured data. It's a much different thing if it's unstructured content or multi-structured content. So talk about that. Yeah, and it's maybe worth talking about how we started technologically. I mean, we started in a SQL database at one point. That was the basis of relay in the early days. And I think we immediately realized that we were leaving out and unable to really handle lots of big data sets. For instance, the scientific literature. Handling that and from a text mining and natural language perspective in a relational database doesn't really work. Totally unscalable. Not to mention, if I try and connect that to something like SEC documents. There's no natural connection there. So we looked at it as, okay, what's important here? Importance is ontologies, an ontologic search first. And then, you know, exploratory and creative search, free text search and things like that. And so we wanted to find ways to actually connect the dots between those. We do it with our ontologies but also by partnering with the Tivio and the Covers which enables us to actually make those connections really seamlessly and scalably. I mean, when we thought about moving off of a relational database, what we wanted to do was build for the problem we had today but also say we know there's going to be, you know, data's going up into the right. You know, transparency's going up into the right. We're going to need to be able to connect these dots long-term and put the pieces together. For those who aren't scientists and drug discovery folks, describe what you mean by ontology. Yeah, so in particular in this world of life science, ontologies are massively important. And I'll give you an example. You know, if I'm talking about a disease and I'm talking about lung cancer, it sounds like one thing, it's not one thing. There's small cell lung cancer, there's non-small cell, there's different stages. You would also call certain types of lung cancer solid tumors because it's a tumor type. So understanding ontologically that those things are connected and, you know, being able to then relate them across relational databases and document sets into one, you know, common entity is really the crucial piece. So we spent a lot of our time. We have right now internally nine custom ontologies that we use that range from things like diseases to genes to drugs to research topics to people, you know, throughout journals and into business terms. Because I think the other big thing for our users is being able to leverage scientific information to make business decisions. So understanding the risk associated with a given disease pathway and factoring that into their commercial decision about, hey, there's these M&A opportunities in front of you. So we try and connect those dots using ontologies. Well, maybe we could take into the technology a little bit. So take us through your journey kind of from that relational database days to where you are today and kind of some of the underlying technologies that you're using to actually kind of connect all these unstructured pieces of content. Yeah, so in the relational database world, there are ye old databases, we call it. You know, we're constantly fighting the ontology problem. Also anytime we tried to add a new database, we're adding complexity and there would always be kind of gain and loss every time we did something. And I think we wanted to solve the immediate problem first, which was let's flatten it out, let's get everything in there and have the connecting or the common connection be that ontology is kind of sitting on top. So that led us towards more of an index-based system. But we didn't want to lose the capability of being able to ask relational questions and structure data when it was appropriate, either for purpose or for speed. So we wanted to kind of plan both worlds. And we're lucky enough to get connected with Sid and the guys over to Tibia. I remember our first meeting, you know, Sid kind of drawing on a whiteboard and me going, you can do that. And it was an eye-opener for me. And we examined a couple of technology companies. But ultimately, you know, it was clear that they could solve our immediate problem with the databases we had and without us losing anything. There were no real trade-offs at the get-go. And then long-term, you know, creating a scalable data operation was very obvious. And scalable in two ways, both in terms of the size of data. You know, we constantly push on, you know, the size of the database that we can and constantly advancing that. And we expect more data in the future. When you talk about long-road, talk about medical records or talk about any of this stuff, which we're not in that world yet, but we may be someday, you know, you have to have a scale of size and performance. But also, scale in terms of data types. So the ability in the future to connect to, you know, oracle databases to do, you know, whatever it might be long-term, you know, those things were needed to be part of the roadmap. So we definitely considered that when evaluating and ultimately chose atypio for a lot of those exact reasons. So, Sid, tell us, Sid, I've probably seen the CTO of atypio. Tell us about that first meeting and sort of what you brought to the table when you met with Bremen. Well, I'll tell you, it was a great experience because they were part of a big mass high-tech kind of start-up boot camp for a lot of money flowing into pharmaceuticals here in Massachusetts. Well, a mass challenge. Mass challenge. That was one of our boys back next. The reason, I'm sure, the reason that they were a finalist is because you look at what they did. It's so incredibly exciting. It's the convergence of the things that I believe are going to drive our economy. And I say that kind of a big statement. I actually really believe it will drive the economy to no small degree for years, decades, it's hard to say. But look at what they put together, right? Investment background, so they understand the process of funding these things. They have the obviously PhDs in the actual science, right? So they understand the technology, the capabilities, all the different aspects of it. But they also thought about what does the decision maker who's trying to get from point A to point D, right? It's a longer journey in pharma. How are they going to get there? Well, in the old days, they would have said we want to look at this drug and we would evaluate this element from a phase two versus another one from a phase two. And scientists would get together and they would do a couple of things. One, they would have a spreadsheet, right? And they would also have typically a taxonomy on disk, probably a bunch of folders that they hand-built, right? With different interesting description names like this one describes the mechanism, this one describes the interactions, this one. And then they would pile PDFs into those documents. And then they would, as a scientist, an MD make a decision, right? And that's what they would drive the decision. The problem is that model worked great, but now we're awash in data. There's far too much data for us to easily consume in that model. More PDFs than we could possibly, you know, organize into our hand taxonomy. Plus you have these huge data sets, right? All of this massive amount of, you know, observation data, sensor data that's being recorded. Putting them together, any one silo is interesting. It's putting them together. That's where you get the spark that leads to that kind of amazing return. And essentially, you know, when I walked in, the first time I met with these guys, they had all of the pieces except the information access layer. They had a database with these different parts of data hand curated put together. The problem was it wasn't a great demo, right? It was, you have to start by doing a big pull-down list and navigating down, again, typical database stuff. And they said, look, we need something that's much more the way the world information works on the web, right? We want kind of a search box, but it's not just search, right? Because we also have to show this aggregate information. The whole point is to take some concept like, whatever that element of that phase two clinical trial is and say, I want to understand the value of this in context with all other things across all the other silos. We did it for them. That was the key thing. We were able to take that, preserve the structured data, the relationships between it, do full text search, but then also support SQL so they can use typical spotfire to do incredible visualizations. And that brought the data to life. So we're here to talk about delivering business value. So give me the business value angle on this. We're making better decisions. We're making, what's the return to the investor? What's the return to the drug company? Yeah, so a couple of unique things that really can do it. I'll give you kind of three brief cases, because I think it's useful to describe. A common job of somebody who's in BD, who's our main client right now, is they get tasked, okay, we need to be in Alzheimer's disease. Okay, you need to go find what's coming up in Alzheimer's disease and make an acquisition, and make a business case of that acquisition, but also understand where we are with the scientific case. And they'll go out, and there are databases out there that they can download spreadsheets of, these are the companies with these assets, but they're not getting the information on, for instance, a mechanism. So for, you know, right now you might say, you know, a hot target is, you know, PI3 kinase, or in Alzheimer's, maybe Parkin or something like that, or these are genes, by the way, that I'm mentioning. We can detect the historical trends in the underlying data that it could have told you last year this is going to be the hot thing next year. And we actually take the step on the analytics and algorithm side to actually factor that into evaluation of an asset. So we have this thing called RVI, which is Relative Value Index, which is our attempt at making a stock market for drug development assets. So by comparing toothpaste to drugs, I can factor in the underlying information and the trends behind it and understand that this one's maybe a bit better than that one or this one's increasing faster. You can have several questions of it, but it's engaging, you know, kind of the raw data to give you a quantitative piece back to measure on. And on a sophisticated level, we actually factor it into valuation. So we can actually write a model that can say, based on this RVI, it's worth X more dollars or X less dollars, which is a real tangible thing for people. I think the other part of it is that, you know, you're engaging something with live information. So if something changes, you know, you can actually get that change and see and be alerted when, like, you know, some big paper was published or somebody presented a conference which totally changes the game, you know, for your world, and now you can understand it. And it's that being current that I think is really valuable at folks. Right now, this is a very episodic thing. Think of the Alzheimer's case. I'm probably going to put my three younger analysts on this for two months. They're going to spend a bunch of time churning data. We might get to an answer. And the day that you get that answer, it's stale. And it's, you know, you lose the value. You just create, you'll have to do it again. And I think that's a big component. One other big piece we focused on, just to give you a use case, is around KOLs. And one of the things that struck me... KOLs. Sorry, knowledge leaders. So in science, you know, there'll be a top researcher who's a major influencer in the field. And if you're in biopharma, there's a couple of worlds here. You may want to partner with them because they're amending the next therapy for whatever disease. You may want to fund some of their research because, you know, you just want to be with the smart guys. They also may be an influencer, both at the FDA or in clinical adoption. So identifying who those guys are is really, really important. And we can actually track individuals and measure things about them that infer value and can ask specific data, quantitative data questions of them and identify kind of who should be there. So talk about, you know, really... Let's boil this down to real business value in terms of is it making better decisions? Is it making more decisions, faster decisions, more accurate decisions? What really are the main, main benefits here? The main benefit is, you know, on the asset side, you can make a better decision. I mean, you're getting... the trend information at all in a quantitative way. So they may, you know, understand intuitively that it's there, but nobody's ever said, yep, that's the number that correlates to the thing you're thinking about. So by having that information in your fingertips, you can make that decision earlier. It provides an evidence base that enables you to leap over the wall of, yes, let's do this, as opposed to waiting for consensus to get there and then being too late. In pharma, by the way, venture tends to be on the cutting edge. Pharma is to be venture-like in that they don't, you know, they're at the cutting edge of what's going on, which I think is really attractive to them because then they can pay a little less for an asset earlier than they would have waited and paid, you know, God knows what for later. So there's definitely a value there. I think the other is in terms of the time that they spend, or what they spend doing with their time. So if the analysis is already done and updating at their desktop, they spend more time worrying about the kind of, the narrowed group of information, a narrowed group of assets, targets, mechanisms that are of interest instead of, you know, spending their time dragging data down and manually curating, you have the smart guys or the experience really focused on making a good decision as opposed to, you know, just getting to the answer. And we're trying to make that leap. For the people that just joined us online or on the call, just a reminder, we're here with Brigham Hyde, his adjunct professor at Tufts University and a managing director at Relay Technology. Management and with Sid Propstein, CTO Vitivio, and Jeff Kelly, Wikibon's big data analyst. We're talking about how to combine structured and unstructured data to deliver business value. And we've been talking a lot in the pharma space. I want to open up the, to the audience to see if there are any questions that they have. We've got quite a large number of people online here. So if someone has a question, let me pause for a second. Good morning. This is David. I have got a question that I'd like your opinion on. What are the, what are the, how do you see the provision of data going forward? Well, what are the sources of that data? Is it government data? Is it data collected by data providers, Google? What are the sources of data? Sid, why don't you talk about that because you're in more areas than pharma, obviously. I think the, as I was alluding to earlier, the business value is not so much by creating intelligence inside one silo, but by creating intelligence across silos, right? And that's what really, really does. They let you look at so many sources. So I believe that companies will find more and more innovative ways to create value or opportunities for themselves by bringing together more and more data. And it's going to be everything. You're going to be seeing licensing sources. I think there are already multiple efforts to create kind of exchanges around data, right? And create markets for data. So that's, that's all going to feed into it. And I think essentially what the lesson, and I think the pattern that you can follow from what Relay is doing is the insight is about across the data and putting, how powerful is it? Again, look back. What was it, that discussion like five years ago inside big pharma, right? It's a bunch of doctors, each with their own viewpoints and experiences taking as much data as they can consume and trying to make a data, some kind of decision. And of course, that brings you quickly to opinions and roundtables and delay exactly as Brigham said, right? So if you can put a number around that stuff, but actually have that number be meaningful and trusted and be able to show, hey, it maps to all these different data sources. Some which are internal, yes, we could agree those are biased, but there are external sources that validate it, right? By creating that linkage. So the answer is data is from everywhere, it's going to be, we're watching data today, we're talking about big data, wait a couple of years, right? Think of the number of sensors that are putting out observations right now. When that stuff starts to get spooled up and stored, we're never going to see the end of it. And scalability is not a nice to have, it's a right to play. If you can't handle these volumes and make these connections, I think the future leaves you by. Let me follow on Sid's comments. Can I just follow that up a little bit? You're implying there's a huge amount of data, I agree with you entirely. That'll be impossible to bring all of that together in one place. How would you, just from the point of view of physics of sending that data, how would you see that being dealt with in the future? Are you going to extract from different locations? Do we need to analyze a place and then pull? Probably. So the scale of things will in time put real pressure on all the different parts of the computing, the infrastructure that is needed to do this, but that's why cloud, the entire cloud model, cloud computing has emerged and the idea that, hey, I can rent time, I can spin up a large number of servers, maybe I really need a huge number, but I only need it for a few days to crunch through some massive data set to get insight, and then I can have it shrink back down to a more normal data set. I think that the distribution of the problem is definitely the future. Today, for example, a Tivio, our active intelligence engine, is essentially a sharded or distributed repository or engine. You can put any kind of information in it, and you can distribute it across lots and lots of servers. You guys are using Amazon servers for a while. Many different cloud configurations are possible, and you can spin up a couple of new ones as you need them to bring in more data. It's not unique to a Tivio. What I'm telling you is the answer to that question is really the distribution of the problem. The interconnection of everything and the ability to access and federate, those things are happening now, and as more and more providers get into that world, it'll become easier to do. Let me answer this question from both sides of it from a limited perspective. I'm a data buyer and an analysis creator, or a metadata creator. That's essentially what it really does. I spend about a third of my time just shopping data sets. Your first question, which I think is a really important one, which is where is the data? Government is some of it. I think ultimately, there's going to be this secondary market of trading analysis. Look at what's happening with Thompson Reuters right now, with Lexis Nexus. They're selling raw data. Sure, you can download a stream of their data, but they're beginning to sell metadata and analysis of that, so tagging. I could see a world in which people take their own internal data sets, and maybe I buy a specific analysis of it. I want to know, temporally, give me the tags for each company and when they announce a certain type of thing. I might buy that instead of downloading the entire Lexis Nexus data set and asking it a question. By the way, as a data seller, look at potentially that big amount for us. I know that people in my marketplace and SAS are beginning to be asked for APIs to their analysis of a data set. You might have somebody who's, like us, is trying to answer a specific question, taking certain data sets, unifying them, making them live and updating, and then selling my answer to a certain question off of that data set. I think there's going to be a couple different roles to play, and add to that the companies which have their own internal data that they all need to deal with. I think it will emerge as that. I'm interested in your perspective on this question of how fast does someone have to react to new information, and particularly in the world of unstructured data. We had someone on recently, on theCUBE, who were discussing if I analyze the data in a half an hour after the data was created, I've lost a quarter of a million dollars. It was in the casino world, right? So, you know, how do you see that impacting the customer set that you're serving? Well, to be honest, a lot of the advantages people are creating for themselves in the market are now done through speed, right? I can take some analysis that I do. I did it every week. I did it every month. The volume of shopping has gone up, right? I have more of an e-channel now to let people in, so I have more data coming in, and now I find if I can process the data faster, I can create a window during which I monetize whatever insight it is that I'm getting a little bit better. And that's just a real, again, a very real world phenomenon. You know, obviously in financial markets, one of my banking clients said something like, you know, if I can get 1% more insight, that's enough to trade on, right? For one second, because I can make a trade on something like that. But it's all part of a continuum. There are many questions that can be answered slowly, and often those answers are the ones that you then could cook up with other answers, other parts of the puzzle, but that you need to do more frequently, and the entire equation changes and becomes interesting when one of those changes dramatically, right? Suddenly it skews everything off. And that's actually the point of the sort of timeliness and the incremental update. I think it's much more powerful to think of those MDs sitting around the table. They make the decision, they make the right decision, and the company marches on. But they miss, you know, two small updates that would have changed everything, because they were unstructured, you know, they were press releases or buried in a journal somewhere, they didn't pick up on it. So systems like relay tech management, that's going to solve that for them. Say, hey, come back, revisit this decision. Most companies are not so good at doing that. But I think that's going to be an emerging skill, right? Remembering why we made that decision, understanding now what do we need to adjust and being able to track every decision and say, are we still on point for this? Versus the old model, which is, well, we put X million dollars and Y people into it and discovered it was the wrong approach. And so, you know, but we did three of those. So we arbitrage did effectively, right? Now maybe you could start two ventures based on that, you know, much more detailed and much more thorough analysis. Let me anecdote. John, this is John Furrier. Hey, John. Hey, I have a question for Sid and the folks on the panel there. You know, obviously, we love Hadoop. There's a lot of conversations around a blog post on GigaOM about Hadoop's days are numbered. And are you guys been following Google, Google's demo product and Percolator? And the thesis was is that Hadoop is too hard to use and might not stay around much longer. Give an opinion on that and how do you see the whole Hadoop ecosystem evolving given Google's recent public disclosure of Dremel and Percolator? Well, I'm not sure I would count on Google as an authority over data analysis, to be honest. What they do is pretty unusual in the sense that they focus really heavily on web pages and the public web and it's a very interesting application. I think Hadoop has a role to play. It is pretty much synonymous with big data, but it's a little bit of an error. There's no big data as being all about volume. It's not, there's more to it. You could look at it as volume and variety and velocity which we were talking about earlier, right? So you have many different aspects. Hadoop is great at dealing with the volume aspect where I have, let's say, I track every click to my website and this produces billions of observations every day. These observations individually are worthless. Not really, they're not worthless, but you wouldn't be interested, a size, you know, image of size 50, my logo image from the web server. It's not relevant. What you want to know is, okay, which parts of my site did Sid visit? And maybe Sid is an interesting, but I want to know across all the people or maybe an audience, one segment of my audience, right? Which sites they went to. So you take that low item value data and you feed that through a system like Hadoop. There are many others too, but Hadoop is the one that seems to be very popular. An analysis, those billions of records now became a handful of records which tells me which of my website sections or properties were most popular and I could even segment it further by audience. That's very valuable insight, but the truth of the matter is we've been doing that kind of analytics for a long time, decades. For one thing, we haven't had the huge volumes that e-commerce systems can now produce, so we haven't had to deal with that much data, but even more than that, the insight was kind of there. We were able to basically take it far enough. But it's still within one silo. There's a massive volume in a silo we already understand. But Hadoop alone, once it produces that data, it's still a silo. And again, the beauty of something like a relay tech management is you take that output, it's one piece of...