 Wait just provide their recording ID just put one in This conference is being recorded Well, welcome. This is John MacArthur. I'm here at at Wikibon headquarters I am John MacArthur, and I am the moderator for today's peer insight We are here today with Brigham Hyde who's an adjunct professor at Tufts University and he's now managing director of relay technology management Brigham has worked in drug development and investment banking and Has and has now moved to a managing director at relay. We're also joined by Sid Probstein CTO of Attivio And Jeff Kelly who is Wikibon's big data analyst and we're here today to talk about combining unstructured and structured structured data For delivering big data business value So Brigham, let me kick it off with you and just to ask you you came for you have you have an interesting background having you have your doctorate in Pharmacology you've worked you're working as an adjunct professor You've done investment banking and now you're at relay. Tell us about that journey and a little bit about what you're doing The idea of relay action in 2008 myself and the other co-founder David Greenwald who's a PhD in genetics and looking at Drug development and the challenges in terms of data overload and analysis that they faced we were both frustrated Scientific entrepreneurs trying to get ideas out of the lab trying to understand a bit more about how the marketplace Worked for those decisions and I think realized very quickly that the people making those decisions high-risk decisions were very underserved in terms of Access to data and meaningful insight from that data and we set out to create relay kind of on that principle and My experience in banking in the meantime confirmed a lot of that from the other side so, you know having to analyze those companies and those drug development assets for instance and Realizing that it was a highly qualitative process in some ways With a lot of information to pour over which was largely being done in a manual basis I mean I lived in spreadsheet world, you know and manually curated data for a long time, so Really wanted to get at those issues and Stepped into big data to do it What are some of the kinds of data that that the scientists didn't have access to that they need access to in order to make More informed decisions. Well, just to give you an example Let's say I'm looking at a phase two drug development asset and you think about the attributes of that asset that are important There's certainly commercial aspects such as you know Transactional information about it or how much I paid for it or how much the market size is that's all Out there. I think the interesting thing is to then connect that to the scientific and clinical information So while it's great to say I have an asset for colorectal cancer it's another thing to understand the underlying science of that asset and The information that's out there that could be mined to determine. Is it more or less likely to work? Is the clinical data there to support? affording of The asset and well the regulatory agencies respond to it and the connection of that data I think is really one of the crucial things relay is done Enabled under the covers by the ability to unify many different types of data So structured on structure will talk about that But just simply finding a way to get this thing into one place and then ask it directed questions that made sense to users Do you know what questions you want to ask or did the investors know what questions they want to ask? You know, it's interesting. I think We have a lot of engineering talent at relay and our CTO is former CTO of Elsevier Mark Krollenstein Guy who's found in Northern light, you know 15 years ago and one of the early search companies So we have a lot of engineering talent. I think what makes relay unique is that we have folks like myself people with experience in biology Pharmacology and also in business to understand what the question we're trying to ask of the data is And I think that's why we've taken on the approach of developing a SAS product because we actually know some of the questions and can put together Those link concepts, and I think that's maybe where we differ from some pure technology plays and why we're a bit unique The users themselves that we interact with our business development folks So this is maybe an MD MBA or a PhD at a high business level They might know the information they want but they don't really know where to get it from and are largely living in the world of being served by databases where you download a CSV and then have to turn information they may go to 20 independent data sources internally and externally to get information So there's no kind of unified process and no way to kind of connect the dots between those efforts And it's it's I mean our product competes with manual data curation. There's no question Which is boggles my mind sometimes, but you know, that's that's the situation we find ourselves in Brigham, so we're talking about it sounds like more than just structured data, of course This is this is a lot of unstructured content as documents and talk a little bit about the types of data assets You're talking we're trying to connect here because it's one thing to kind of bring together disparate datasets if they're structured data Right, it's a much different thing if it's unstructured content or multi-structured content. So talk about that Yeah, and it's maybe worth talking about how we started technologically I mean we started in a sequel database at one point that was the basis of relay in the early days And I think we immediately realized that we were Leaving out and unable to really handle lots of big data sets for instance the scientific literature You know handling that and and from a text mining and natural language perspective in a relational database It doesn't really work totally unscalable not to mention if I try and connect that to something like SEC documents, you know, there's no natural connection there. So we looked at it as okay What's important here? Importance is ontologies an ontologic search first and then you know Exploratory and creative search free text search and things like that And so we wanted to find ways to actually connect the dots between those we do it with our ontologies But also by partnering with a TV owner the covers which enables us to actually make those connections really seamlessly and scale I mean when we thought about moving off of a relational database We wanted to do was build for the problem we had today But also say we know there's gonna be you know data is going up into the right, you know Transparencies going up into the right. We're gonna need to be able to connect these dots long-term and put the pieces together for those who aren't Scientists and and drug discovery folks. Yeah describe what you mean by ontology Yeah, so in particular in this world of life science ontologies are massively important I'll give you an example You know if I'm talking about a disease and I'm talking about Lung cancer sounds like one thing. It's not one thing. There's small cell lung cancer. There's non small cell There's different stages. You would also call certain types of lung cancer solid tumors because it's a tumor type so understanding ontologically that those things are connected and You know being able to then relate them across Relational databases and document sets into one, you know common entity is really the crucial piece So we spent a lot of our time we have right now internally Nine custom ontologies that we use that range from things like diseases to genes to drugs to Research topics to people You know throughout journals and into business terms Because I think the other big thing for our users is being able to leverage scientific information to make business decisions so understanding the risk associated with a given disease pathway and Factoring that into their commercial decision about hey, there's these M&A opportunities in front of you So we try and connect those dots using ontologies Maybe we could take into the technology a little bit. So take us through your journey kind of from that Relational database days to where you are today and kind of some of the underlying technologies They're using you're using to actually kind of connect all these Unstructured pieces of content. Yeah, so in the in the relational database world or our ye old database as we call it You know, we had we we were constantly fighting the ontology problem Also, anytime we tried to add any database we're adding complexity and there would always be kind of gain and loss every time we did something and I think we wanted to solve the immediate problem first which was let's flatten it out. Let's get everything in there and Connect it have the connecting or the common connection be that ontology is kind of sitting on top So that led us towards more of an index-based system But we didn't want to lose the capability of being able to ask relational questions and structure data When it was appropriate either for purpose or for speed So we wanted to kind of play in both worlds and we're lucky enough to get connected with Sid and the guys over to I remember our first Meeting you know Sid kind of drawing on a whiteboard and me going you can do that And it was an eye-opener for me and we examined a couple of technology companies But ultimately, you know, it was clear that they could solve our immediate problem with the databases We had and without us losing it There were no real trade-offs at the get-go and then long-term, you know creating a scalable data operation was very obvious and Scalable in two ways both in terms of the size of data, you know We constantly push on you know the size of the database that we can and constantly advancing that and we expect more data in the future You talk about Long road medical records or talk about any of this stuff which we're not in that world yet But we may be someday, you know, you have to have a scale of size and performance But also scale in terms of data types So the ability in the future to connect to You know Oracle databases to do, you know, whatever it might be long-term, you know Those things were needed to be part of the road map So we definitely considered that when evaluating and ultimately chose a Tivio for a lot of those exact reasons So said tell us so the sick approach to indicate the CTO of a Tivio Tell us about that first meeting and sort of what you brought to the table when you met with Britain Well, I'll tell you it was it was a great experience because they were part of a big mass high-tech kind of startup Bootcamp Island a lot of money flown into Into into pharmaceuticals here in in mass challenge. I gotta get a mass challenge. We're first-year finalist Oh, okay And the reason I'm sure the reason that they were a finalist is because you look at what they did It's so incredibly exciting. It's the convergence of the things that I believe are going to drive our economy And I say that kind of a big statement. I actually really believe it will drive the economy to no small degree for years Decades, it's hard to say but look at what they put together, right? Investment backgrounds, they understand the process of funding these things They have the obviously PhDs in the in the actual science, right? So they understand the the technology the capabilities all the different aspects of it But they also thought about what what does is a decision-maker who's trying to get from point A to point D, right? It's a longer journey in pharma. How are they gonna get there? Well in the old days, they would have you know said we want to look at this drug or this drug could do evaluate this element from a phase two versus another one from a phase two and Scientists would get together and they would do a couple things one They would have a spreadsheet right and they would also have typically a taxonomy on disk Probably a bunch of folders that they hand-built right with different interesting description names like this one describes the mechanism This one describes the interactions this one's and then they would pile PDFs into those documents And then they would as a scientist and eminent, you know MD make a decision, right? And that's what they would drive the decision now the problem is that model worked great But now we're a wash and data there's far too much data for us to easily consume in that model more PDFs And we could possibly you know organize into our hand taxonomy plus you have these huge data sets, right? All of this massive amount of you know observation data sensor data that's being recorded Putting them together any one silos interesting. It's putting them together. That's where you get the spark that leads to that kind of amazing return and Essentially, you know when I walked in The first time I met with these guys they had all of the pieces except the information access layer They had a database with these different parts of data hand-curated put together The problem was it wasn't a great demo, right? It was you have to start by Doing a big pull-down list and navigating down again typical database stuff And they said look we need something that's much more the way the world information works on the web Right, we want kind of a search box But it's not just search right because we also have to show this aggregate information The whole point is to take some concept like you know, whatever that Elements of that phase two clinical trial is and say I want to understand the value of this in context with all other things Across all the other silos. So that's what we did it for them That that was the key thing where we were able to take that preserve the structured data the relationships between it do full Tech search, but then also support sequel so they can use tipcoe spot fire right to do incredible visualizations And that brought the data to life. So we're here to talk about delivering business value Yeah, so so give me the business value angle on this what what we're making better decisions We're making what's the return to the investor? What's the return to the drug company? Yeah, so a couple of unique things that Relay can do and I'll give you kind of three brief cases because I think it's it's useful to describe You know a common Job of somebody who's in VD who's our main client right now, you know as they get tasked, okay We need to be an Alzheimer's disease. Okay, you need to go find what's coming up in Alzheimer's disease and And make an acquisition and make a business case that acquisition But also understand where we are with the scientific case and they'll go out and their databases out there that they can Download spreadsheets of these are the companies with these assets, but they're not getting the information on for instance some mechanisms So for you know, right now you might say, you know a hot Target is you know PI three kinase or in in Alzheimer's might be park in or something like that Well, these are genes by the way that I'm mentioning We can detect the historical trends and the underlying data that it could have told you last year This is going to be the hot thing next year And then we actually take the step on the analytics and algorithm side to actually factor that into evaluation of an asset So we have this thing called RVI which is relative value index Which is our attempt at making a stock market for a drug development asset So I'm comparing two phase two drugs I can factor in the underlying information and the trends behind it and Understand that this one's maybe a bit better than that one or this one's increasing faster You can ask several questions of it, but it's engaging You know kind of the raw data to give you a quantitative piece back to measure on and on a sophisticated level We actually factored into valuation. So we can actually write a model that can say based on this RVI It's worth X more dollars or X less dollars, which is a real tangible thing for people I think the other part of it is that you know, you're engaging something with live information So is something changes, you know, you can actually get that change And and see and be alerted when like, you know, some some big paper was published or somebody Presented a conference which totally changes the game, you know for your world and now you can understand it And it's that being current that I think is really valuable folks right now This is a very episodic thing think of the Alzheimer's case I'm probably going to put my three younger analysts on this for two months They're gonna spend a bunch of time churning data We might get to an answer and the day that you get that answer it's stale And it's you know, you lose the value you just create you'll have to do it again And I think that's a big component one other big piece we focus on just to give you a use case is around KOLs and one of the things that struck me Sorry knowledge leaders so in science, you know There'll be a top researcher who's a major influencer in a field And if you're in biopharma, there's a couple worlds here You may want to partner with them because they're amending the next therapy for whatever disease You may want to fund some of their research because you know, you just want to be with the smart guys They also may be an influencer both at the FDA or in clinical adoption So identifying who those guys are is really really important and we can actually track individuals and measure Things about them that infer value and can ask specific data quantity of data questions of them And identify kind of who should be there So talk about you know really let's boil this down to real the business value in terms of is it making better decisions? Is it making more decisions faster decisions? More accurate decisions. What what really are the main main benefits here? The main benefit is you know on the asset side you can make a better decision I mean you're getting they don't have access to the the trend information at all in a quantitative way So they may you know Understand intuitively that's there, but nobody's ever said yep That's the number that correlates to the thing you're thinking about so by you having that information your fingertips You can make that decision earlier it provides an evidence base that enables you to leap over the wall of yes Let's do this as opposed to waiting for consensus to get there and then being too late in pharma by the way Venture tends to be on the cutting edge It now enables the farmers to be venture-like in that they don't you know They're at the cutting edge of what's going on which I think is really attractive to them because then they can pay a little less For an asset earlier Then they would have waited and paid you know God knows what for later So there's definitely value there I think the other is in terms of the time that they spend or what they spend doing with their time So if the analysis is already done and updating at the desktop They spend more time worrying about the kind of the narrowed group of information a narrow group of Assets targets mechanisms that are of interest instead of you know Spending their time dragging data down and manual curating you have the smart guys or the experience really focused on making a good decision As opposed to you know just getting the answer to the answer and we're trying to make that lead for the people that just For the people that just joined us online or on the call just a reminder We're here with Brigham Hyde his adjunct professor at Tufts University and the managing director at Relay Technology Management and with Sid Probe propstein CTO Vettivio And Jeff Kelly Wikibon's big data analyst. We're talking about how to combine structured non-structured data to deliver business value And we've been talking a lot in the pharma space. I want to open up the To the audience to see if there are any questions that that they have we've got quite a large number of people online here So if someone has a question, let me pause for a second Good morning. This is David. I am I have got a question that I Like your opinion on well, what are the what are the How do you see the provision of data going forward? Well, what are the sources of that data? is it government data is it data collected by data providers Google what are the sources of data? Sid why don't you why don't you talk about that because you're in more areas than pharma? Obviously, I think the as I was Alluding to earlier the business value is not so much by creating intelligence inside one silo buy But by creating intelligence across silos, right? And that's what really really does they let you look at so many sources So I believe that companies will find more and more innovative ways to create value or opportunities for themselves By bringing together more and more data and it's going to be everything. It's going to be public sources You're going to be see licensing sources I think there are already multiple efforts to create kind of exchanges around data right and create markets For data, so that's that's all going to feed into it And I think Essentially what the lesson and I think the pattern that you can follow from from what relay is doing is the insight is about across the data and putting how powerful is it again? Look back. What was it that discussion like five years ago inside big pharma, right? It's a bunch of doctors each with their own viewpoints and experiences taking as much data as they can consume and Trying to make a data at some kind of decision And of course that brings you quickly to opinions and roundtables and delay exactly as Brigham said, right? So if you can put a number around that stuff But actually have that number be meaningful and trusted and be able to show hey it maps to all these different data sources some which are internal Yes, we could agree those are biased, but there are external sources that validate it right by creating that linkage So the answer is data from everywhere. It's going to be we're watching data today We're talking about big data wait a couple of years Right think of the number of sensors that are putting out observations right now when that stuff starts to get spooled up and stored We're never going to see the end of it and scalability is it's not a Nice to have it's a right to play if you can't handle these volumes and make these connections I think the the future leaves you by let me follow on it's comments Can I just follow follow that up a little bit that you're implying the huge amounts of data? I agree with you entirely That'll be impossible to bring all of that together in one place How would you how would you just from the point of view of physics of of spending that data? How would you see that being dealt with? in the future are you going to Extract from different locations. We need an analyzing place and then pull probably so the scale of things will in time Put real pressure on all the different parts of the computing You know the infrastructure that that is needed to do this, but that's why cloud right this entire cloud model Cloud computing has emerged and the idea that hey I can rent time I can spin up a large number of servers Maybe I really need a huge number But I only need it for a few days to crunch through some massive data set to get insight and then I can have a Trink back down to a more normal data set. I think that you know the Distribution of the problem is definitely the future Today for example a tibio our active intelligence engine is essentially it's a sharded or distributed repository or engine You can put any kind of information in and you can distribute it across lots and lots of servers You guys are using Amazon servers for a while You know many different cloud configurations are possible and you can spin up a couple of new ones as you need them to bring in more Data, so I don't it's not unique to a tibio What I'm telling you is I the answer to that question is really the distribution of the problem Right is and the interconnection of everything and the ability to access and federate those things are happening now And as more and more providers get into that world, you know, it'll it'll become easier to do Yeah, let me answer this question from both sides of it from a limited perspective I'm a data buyer and an analysis creator, right or a metadata creator You know that's essentially what really does and I spend about a third of my time just shopping data sets So on your first question which I think is a really important one, which is where is the data? You know government is some of it I think you know ultimately there's going to be this kind of secondary market of trading analysis, so Look at what's happening with with Thompson Reuters right now with Lexus Nexus I mean they're selling raw data sure you can download a stream of the data But they're beginning to sell metadata and analysis of that so tagging and so I could see a world in which you know People take their own internal data sets and maybe I buy a specific analysis of it So I want to know temporarily give me the tags for each company and when they announced a certain type of thing And I might buy that instead of downloading the entire Lexus Nexus data set and asking you a question And I by the way as a data seller look at potentially that big a model for us I know that people in my marketplace and sass are beginning to be asked for APIs to their analysis of a data set So you might have somebody who's like us is trying to answer a specific question Taking certain data sets unifying them scaling them making them live and updating and then Selling my answer to a certain question off of that data set so I think there's going to be a couple different roles to play and Add to that the companies which have their own internal data that they all need to deal with so I think Said I'm interested in your perspective on this question of how fast does someone have to react to new information And particularly in the world of unstructured data. We had we had someone on recently On the Cube We're discussing if you if I analyze the data in a half an hour after the data was created I've lost a quarter of a million dollars. It was in the casino world, right? So, you know, how do you see that impacting, you know, the customer set that you're serving well to be honest A lot of the advantages people are creating for themselves in the market are now done through speed, right? I can take some analysis that I do I did it every week I did it every month. It was fine volume of shopping has gone up, right? I have more of an e-channel now to let people in so I have more data coming in and now I find if I can process the data faster I can create a window during which I monetize whatever insight it is that I'm getting a little bit better And that's very that's just a real again a very real world Phenomenon, you know, obviously in financial markets One of my banking clients that's something like, you know, we can get 1% more insight That's enough to trade on right for one second because I can make a trade in a hundred milliseconds or something like that So the speed is it but it's all part of a continuum there are many questions that can be answered slowly and Often those answers are the ones that you then could cook up with other answers other parts of the puzzle But that you need to do more more frequently and the entire equation changes and becomes interesting when one of those changes dramatically right suddenly excuse everything off and That's actually the point of the sort of timeliness and the incremental update I think it's much more powerful in unstructured because we're not used to it right think of those MDs sitting around the table They make the decision they make the right decision and the company marches on but they miss You know, two small updates that would have changed everything because they were unstructured You know, they were press releases or law buried in a journal somewhere. They didn't pick up on it So systems like relay tech management that's gonna solve that for them say hey come back Revisit this decision most companies are not so good at doing that But I think that's gonna be an emerging skill right remembering why we made that decision What data it was based on and then understanding now What do we need to adjust and being able to track every of every decision say are we still on point for this? Versus the old model, which is what we put X million dollars and why people into it and discovered it was the wrong approach and so, you know But we did three of those so we arbitrage did effectively right now You maybe you could start to ventures based on that, you know much more detailed and much more thorough an analysis Hey John I have a question for Sid and the folks on the panel there You know, obviously we'd love to do this a lot of conversations are on a blog post on giga home about Hadoop's days are numbered and are you guys been following Google? Google's demo product and percolator and the thesis was is that kadoop is too hard to use it might not stay around much longer given opinion on that and and How do you see the whole? Hadoop ecosystem evolving given Google's recent public disclosure of Dremel and percolator Well, I'm not sure I would count on Google as an authority over data analysis to be honest They what they do is pretty unusual in the sense that they focus really heavily on web pages in the public web And it's a very interesting application. I think Hadoop has a role to play it is pretty much synonymous with big data But it's a little bit of an error people think of big data as being all about volume. It's not there's more to it You know, you could look at it as volume and variety and velocity which we were talking about earlier, right? So you have many different aspects Hadoop is great at dealing with the volume aspect where I have let's say I track every click to my website and this produces billions of observations every day These observations individually are worthless not really they're they're not worthless But you wouldn't be interested you wouldn't have a company meeting over the fact that you know Sid probes Dean requested a size, you know image of size 50 my logo image from the web server it's not relevant what you want to know is okay, which parts of my site did Sid visit and Maybe Sid isn't interesting, but I want to know across all the people or maybe an audience One segment of my audience right which sites sites they went to so you take that low item value data And you feed that through a system like Hadoop and there are many others too But Hadoop is the one that seems to be very popular and you produce high value summary records, right? This is an analysis those billions of records now became a handful of records Which tells me which of my website sections or properties were most popular and I could even segmented further by audience That's very valuable insight, but the truth of the matter is we've been doing that kind of analytics for a long time decades For one thing we haven't had the huge volumes that e-commerce systems can now produce right so we haven't had to deal with that much data But even more than that, you know the insight was kind of there. We were able to basically take it far enough But it's still within one silo Hadoop solves the problem of massive volume in a silo we already understand But Hadoop alone once it produces that data It's still a silo and the again the beauty of something like a relay tech management is you take that output It's one piece of the puzzle and the value is when I put the data all together and look across it Yes, I want to understand the website popularity But I also want the internal view the internal data set the public company survey the public end-user survey I'm all of the analyst reports all the email back and forth between our two companies Etc etc etc and the more of those and the more you can boil that into a score. Wow Can you also you share with me your Your angle on all the developments around machine learning with that conference Just yesterday in San Francisco graph lab and it's very complicated around all the different graph architectures and the data having a graph format But machine learning seems to be at the heart of that. So, you know, you mentioned ontologies earlier machine learning has been around for a while as well It's another great question, you know, I think machine learning brings a little bit of the machine intelligence to it So earlier I talked about how you know relay kind of curated the set of data One of the things that brilliant people with a lot of domain experience do when they look at data, right? Is they they they realize things they understand things and they may want to tag the data, right? They may want to They may they may identify Separations or divisions or outcomes from the data that they think are strong the problem very often then is to take that and generalize it You know technologists who are listening on the phone How how long have you spent trying to tune an algorithm or tweak an algorithm to deal with the fact you have far too little data To really reflect the problem and and frankly in structured data That's much less of a problem when you start looking at unstructured data analysis You need a lot of data a lot of it to really get the algorithm to do something like identify The concept for any document on the internet, right? That takes a lot of data And it's an advantage, you know, frankly on the web that there is all that data Machine learning is that tool is the tool you use to do that So when Brigham can say hey, you know Connecting dot ABC. This is a good pattern It can now use a machine learning engine to identify what are the kind of features and patterns that are present in that and Then say make decisions based on that right and find all of the other examples So find me all the other ABCs not this one, but the ones that are like it right the same kind of effect That's what machine learning is great for it. I can tell you that you know at a TV We've been a long time believers in machine learning. We've brought all the different Approaches together whether it's language modeling to do things like key phrase extraction Which gives you the really good concepts or terms inside a document whether it's our you know clap machine learning Classifier whether it's our sentiment analyzer these tools allow you to pick an outcome and then find all the others And that's what I think that's why it's important because it lets you find an example in data And then find all the others like yeah, and we use it today You know I mentioned the RBI score before the index I mean that is fed by machine learning now I will make the point that in my world or at the edge I'm at there's kind of hypothesis-driven and non-hypothesis-driven approaches and You know the non-hypothesis-driven is kind of the purist machine learning and we have tools out of the box in the TV Oh, we can use to do that just draw connections on these data sets and use them to feed You know certain answers and and you know we do explore those parts of it We're asking kind of an open question a lot of time for high-end users or people really need business intelligence You need a very targeted question You're going in with some concept or framework of the question You're going to ask and we create a lot of these kind of secondary variables and secondary outcomes to measure that we then machine Learn specific things against so so the difference so the difference would be Between asking the question what should I invest in versus should I invest in this well take? Let me give you an example like what if it says We run machine learning and it determines that the cure for cancer is vitamin a right? That's not really an answer that's meaningful to our users What would be meaningful is if I can constrain certain parts of it to say consider You know ones that have not been FDA approved and not generic consider like you know mechanisms that are related to that So consider patent aspects that look like this So by training that set and kind of targeting it then you open up the power machine learning and you can actually Drive to a meaningful answer as opposed to kind of data against the wall See what sticks and this is where you get into things like causal inference modeling markup chains everything else I think there'll be a lot of sophistication particularly in life science emerging in that and big data health care quite frankly in the next 510 years so You know that's where it's going you have to be able to have that skill set both on the broad Dataset and also in a targeted way to get to real meaning I think So I wonder if we could switch gears a little bit to like some of the cultural considerations You know you talked about we talked before the call a little bit about You know some some of the people in your industry not having very much quantifiable data Making decisions based on they're taking a long time to develop their decision-making process your tool comes in They're doing it allows them to do it much quicker But it's different and it's it's a new way of doing business and and how do you approach getting people to adopt a new Way of doing business and in particular how is it changing their jobs if now all of a sudden they don't have to take weeks or Months to come to these decisions. They can do it in a day or less How is that changing the industry? It's been it's been really interesting I think and and we're we come from the soaks who understand it scientists in particular Are cynical about data and they're skin that cynical about trends and kind of algorithmically driven things I think the way we've gotten around that in our first product be live is that we give them quant But we're one click away from the document that is linked to that quantitative measure And that's why having to be underneath is great because it's all you know search driven directly You can go there through the ontologies and actually return the reference So you get to join that qualitative experience of I read scientific papers I know what's going on with this quantitative thing that kind of measures my intuition and you know the experience with that culturally there's a great Exibilize we're with a client and we we were given a number of assets They were evaluating we had to come up with our evaluation of it and they had already done there so there's the moment where I slide my data-driven answer across the table and They were looking to narrow it to four top ones now We had two of the same that they had so confirmed something for them with data Which I think is always good and then two that were different and one of the ones we demoted of theirs We picked up something in the intellectual property. I Information they wouldn't have joined to their analysis yet. The lawyers would have done it a month later We picked up something that you know sort of save them a month of BD time because you know That's not going to work. So example of a great downgrade Then somebody is there staring at him kind of slams his hand on the table goes. I knew this one was good One of the ones we promoted became they rehashed the argument they had had before so it was kind of like Quant is not the answer, but it is another evidence base where if you introduce it culturally as you know Again, they could go talk to KOLs knowledge leaders thing to talk to the experts But then add this as a secondary component and culturally that tends to work Also being able to be transparent gets document level is key to them understanding what that quant means So it's not a black box. It's right allows them to drill down and really Essentially test out the answers. They're getting the advice they're getting from relay That's how they're gonna ultimately grow the confidence in machine learning and you know these kinds of technical systems I think we're gonna made a great point you put up a quant number, but then you let them see the documents Think about that as a transitional state, right? Previously you would have just surfaced the documents for the talented analysts to consume and then you know draw the conclusion out So putting some data some structure around it. It just makes for a much smoother transition It's it's a lot different than going from no data, right to to having some data it's a much more subtle increase and We've seen it a lot actually, you know, it's it's very challenging. We have some Intel clients and They they've been big on machine learning for a long time They've been looking across silos, you know a long time that's been a goal for them But the challenge often is to when you suddenly start putting structure when they're used to having just surface the data for me as an analyst But as over time as they see hey, this number is actually Working or this you know, whatever this recommendation is bringing me something. That's Letting me understand more about you know, how I make this decision their own mental process changes They realize, you know, I actually only look at the first four or five documents to assure myself So now using the score I can go down and see maybe a breakdown of the quartiles of documents and go and look at some Of the lower-level documents and that changes the way I use the data and now I start to trust the number more and then in time Right, we believe that you'll start to say well if the numbers above 80 I don't even need to be involved right the analyst doesn't need to be there the decision can be made automatically Do you have the ability with your technology to drill down and discover the the root source of of Incorrect information So can you kind of mine for? That bad information in these sort of diverse databases of unstructured data. Oh, yeah I had a little bit to do that How about the deliberate bad data trying to manipulate the market for example? Ahead of time or using it to get a drug approved. Well, how do you how do you see that playing out? And how do you how do you accommodate that in your analysis? So You know, are you asking how long I think the market to respond to this data that we now introduced I think he was asking how do you out the criminals? Well, I'll tell you one that I think is a focus and gets brought up by all of our clients Which is you know, here's a here's a phrase. I hear a lot. We don't believe the publication literature We think editorial processes flawed or we think there's bias at point XYZ You know, we can actually measure those things and we develop patterns to detect when there is a scientific controversy So I'll give you a more recent example C-Tep inhibitors Okay, this is a we're supposed to be the follow-on to Lipitor So Lipitor a huge drug for the industry C-Tep is supposed to be the next drug You know and and get all that revenue Well, you know there the science isn't looking like it worked out and there was a drug that failed and it was interesting There was the backlash of the failure and there was the backlash to the backlash where you know The scientists said no, no, no, it was just that drug. Everything else is still good And you can actually see those waves in the data and I think we look at a lot You know you mentioned velocity said I mean we're looking at multiple derivatives of stuff I'm trying to detect when you see those pops and signal and those mean something It's almost like if you see a then B then see you're more confident in F, right? You know, it's actually transmitting that back to folks So we spent a lot of time not just on you know, hey positive affirmative stuff But also, you know, what are the negative things that there's a project we're discussing with the FDA could we You know fed her out Adverse event profiles based on early data. So can we say this with the mouse model data looked like, you know What's the correlation of that to the potential, you know a problem and not just the data But relate the whole ecosystem and see if we can detect those things. So so exciting stuff there And what about you know kind of helping you wait through all the false positives that might be out there or data That's not or just insights that aren't really insights. Maybe it's just noise. I mean, how do you help your? Clients understand what something is really relevant and when it's not yeah, I mean we right now spend a lot of time Talking about what the market does so detecting signals about when we have seen those controversy where they look like and then look for common signals In the future, you know dream of mine you talk about scientific data And and I got a convincing to take in graph axes and you know detect statistical models, you know within figures, but Long term you can imagine coming up with your own model to measure that stuff and you know seeking out certain aspects of it again Kind of hypothesis versus non-hypothesis, you know targeting it that way I think that'll come I mean in the data transparency thing and like we keep pointing up in the right there's also going to be data complexity and as complexity is introduced you're going to get More refined signals are more important signals than you're getting just off kind of the basal noise And so ability to deal with kind of complex data and it will also be really important I think there will be a substantial change in investment in sort of It's like the analogy of fraud detection, right? There will be more data quality on unstructured data people will become very interested in You know understanding what does a misspelling mean? Is it deliberate right or what if two entities are placed near each other a lot? Which will be a typical Approach to kind of throw some of these systems off on their biases Well one thing is you know getting good data sources and clean data sources that's still important There's a reason people pay for data right they will pay relay because relays data is really high quality and you have you know All kinds of checks and things that go on when you go and harvest data Actually, especially when you harvest a lot of data that can be one of the hardest things to do quality assurance on There's so much of it. How would you even go and find your issues in the data? We have a lot of different techniques, you know to bring to bear some of them work better in different industries You know we have a client a big manufacturer who is essentially using indexing of documents to find documents that need Need review and they have a whole series of special terms that they look for and if these appear in the documents Then they review them and what they're finding is these documents should never have been put in the wild to begin with Wild is you know relative right but some internet But that you know being able to say this is an example of a document that I don't want to be publicly available to all my Employees or whatever or partners, you know then being able to train the system will find others like that That's one example finding you know essentially variant but repeated Paragraphs for for plagiarism detection. That's there's been a lot of work around things like that students are hating that Well, I we've got about 15 minutes left And I want to make sure that anyone in there that's online has an opportunity to ask questions So let me pause here talking about big data unstructured and structured data analysis I wonder if I could ask a question. This is Dave Vellante and Brigham you mentioned the business user experience early on in your remarks And I'm wondering how you help your end customer the analyst visualize all this data No, and then that is a tremendously important part of this You know The users don't like to get a big text or CSV file to pour over that's not what they're looking for In our UI we took kind of two approaches, you know We we've used GWT and a number of apps there that are kind of really rapid for visualizing data off our back end and Always showing things temporally. I think is crucial You know right now everybody gets kind of today's answer But showing trends the part of it and the second part that that's really big and we are working right now with both Tableau and tipco spotfire in our front end in OEM and We've been using them for the last six months ahead of our product launch in You know Delivering actually deliverables and consulting and having our analysts actually work from that end There's a couple of really important parts to this So number one you're able to cut the data in any way you want beyond what you can do in Excel Or you know anything like that So it tends to blow away these users just on that basis But then there's this aspect of it that again combines quant with quals So you can have a graph that's showing a temporal thing on a given gene for instance And then right next to it you can show the documents that are related to that So it's kind of this experience of like oh, it's going, you know up at a faster rate Let me zoom in at that range and then see what happened there You know, what were the key things that might drive it that makes sense to me From an understanding basis the other part of this is that and this is where one of the crucial things with a tibio And we're actually announcing this week That we're a feature customer now of typical spotfire and we're doing a joint release from the tibio Where there's a crucial part where it's one thing to build a graph that's plugged to a database relational or otherwise and Those have been around for a while now It's another thing to engage the user in exploratory sense to use their intuition and search To actually craft the point so I have a dashboard right now where I'm looking at thought leaders KOLs And I can start with my ontologies and use ontologic search and say alright I want the best guy in anemia and on these two quantitative basis and then I can add the ability to search the documents Only that person say okay constrain it by You know a certain type of mouse model that is common to research so I can add a text string That's then limiting my people results by the documents that co-occur under that structure And so instead of me a priori telling the user this is what's important It gives them the ability to explore and then return documents and related information so it's it's giving kind of the You know the steering wheel to the user to drive quant which is great The last really important part of this and we started to touch on it is the ability to do statistical model Because at the end of the day with quant a lot of users particularly on a big decision like This asset of that asset for M&A. You need to know how sure are we and when were we sure and to do that You really need to build in statistical inference into it and for us right now We built a layer with an open source product called are but there's other packages we can integrate We have a kind of front-to-back stack where we have a TVO's index on the bottom are in between and then spot fire Visualization on the front so we're actually in real time doing statistical analysis of this quant which is great because sometimes you get back You know like maybe we're not so sure about ophthalmology You know we're not it's not that clear based on the data But we're really confident about another area and I think the last leap for big data for people to turn over the keys to Like I said the score it's going to start with seeing documents and getting used to it But ultimately it's going to be like look statistically this is what's likely to happen You know, and I'm this confident about it and here's when I knew you know and I don't listen to this guy Right, exactly. This guy's wrong. I think it's a you know amazing example of what unification does relay unifies information They provide intelligence across it even think about the you know user interface the experience of the business user Brickham has used a lot of different terms ontologies search dashboards Reporting visualization graphs. That's what you I is all about. It's about Creating intelligence off of all these different sources all these different methods. It's not about one method It's not you know search is not enough bi is not enough It's the collective and I believe that this is very much the future of big data and data analysis is putting all the pieces together With you know that layer of brilliance and an insight on top. So when you sit down with a CIO, what do you tell them? I say that They should be looking immediately to use UI a on their next on their strategic projects quite honestly That's the best way to do it, right? When you have a project that requires you to integrate information and provide, you know search and bi analytics Dashboards all of that experience on top of it and maybe even workflow and those are the apps that I think people are building now that matter They capture the interaction with the user gets to a deeper level It provides a greater quant and qual insight for the internal decision-makers They have to start doing that right now Because gaining the expertise with it getting the infrastructure in place and learning what it takes And how easy it can be to merge a bunch of silos together and build an application on top of it And how quickly and how low the how reduced the risk can be you know, we didn't talk about that much But one thing about relay is they got to market faster because they didn't try and build the stack, right? They took a stack and said we're gonna solve We're gonna build the app and that's we're gonna put our effort in our uniqueness and that's where they're brilliant shines But they might be brilliant open-source developers or they might be brilliant brilliant, you know engineers and they have all those folks But that's not what that's about. It's it's the collection. Don't don't let me you know No, actually it's a very interesting point because one of the constraints that's sort of been thrown out is the lack of data scientists in order to do big data analytics That Are you seeing that are you saying that a less sophisticated data analyst can drive more value for the company by using? Particular set of tools that that make the experience easier more understandable. I don't know if I'd say that they're a Lesser analyst. I think they're all brilliant. You know brilliant analysts are the are the audience at the end of the day decision makers the the point is Relays expertise is in knitting together This is my take knitting together and proving the intelligence on top of all the sources and all the technology It's not implementing the individual technologies And that's where I think yes to a CIO would say rather than going out and constructing your own UI a stack from you know Mega-vendor one here and some open source here And you're gonna knit it together and then you're gonna have to you know Learn how to track these projects and keep up with the patch rates and all the things that go into building software Maybe that's not your expertise, but you've got the analysts in the business And if somebody else could help you put the information together and surface it right? I keep using this line surfacing the information for a town talented analyst at the end of the day That's if you can do that right Then the focus is to interpretation. It's software is a service The model for the David again. Can I push on that a little bit? I'll ask a question of Brigham. Yeah You're suggesting that the problem is That you just provide better tools for the analyst What about the analysts themselves and the education and the training that's required at both at two levels? What advice would you give to the government in terms of what's required to really take advantage of this at one level? I mean it's for example that the amount of knowledge of Bayesian Statistics, it's not it's not exactly great in the country as a whole and secondly for a CIO What advice would you give them? To develop the skills required for for the next for the next decade Yeah, well from my perspective. Well, I can tell you I told all my younger cousins to switch majors to bioinformatics about five years ago And I I'm very my background on Wall Street I covered among other things the genetic tools companies and there's this been this building technology around the human genome Sequencing anything out. There's gonna be a massive amount of data that requires the only way to get answers out of it is ontologies And then really sophisticated one and that's could be hugely valuable to the health care system So from a government perspective, what should you do? First of all, I believe, you know Everybody should get some level of statistical and biologic education. Maybe I'm biased the biology But certainly statistics beyond what's done today. I think you don't have to have an analyst that is a sophisticated Bayesian modeler if you have Some of those tools out of the box and I I'm looking and paying attention to both what could potentially happen on Sid's end but also on Tibco's end and somebody is going to build this in where it's gonna be more intuitive You know and and just from a visualization side those companies have taken leap one, you know Which is to be able to basically build cross tabs really easily, you know what I mean and and put them in a beautiful form So I think there's gonna be more out of the box stuff available without having to be an advanced programmer or stats guy But my government perspective you need to you know, have an understanding of how what that means and how to use it I'm lucky enough to have been tortured by a stats professor in pharmacology So that was where my kind of understanding of this originally came from so I guess he should be teaching more folks On the second side from the CIO perspective, what would I tell a CIO, you know When I when I look at and I've talked with these groups because you know There's some interest obviously in them incorporating their own data into our analysis And this is where it was nice connection between a TVO and relay and they look at it Like well, we've got our internal silo and we want to match it to what you've got You know, we have the scale and ability to do that. We could create a private instance I mean there's all kinds of back to the a TVO choice is all times the reasons why that works for us I think CIOs are trying to stay ahead of the data problems But they're not that in touch with the actual people on the end of the phone making the decision and Those are our those are our people. I mean those are the ones we're trying out and just to prove my point That you may have a CIO is investing in a major internal data architecture movement and maybe even Around BI specifically He will have let's say as 50 users 30 of them have SaaS subscriptions to something They don't really like and maybe and actually usually multiple not only that they're using Google and they're using Other search engines for specific data sources more often than not so they may ask something of internal knowledge Discovery to get like a report back, but they're not using at their desk That's just not happening and there's a couple reasons for that, you know number one They don't have access to the data internally in any kind of meaningful way This is the structure side, you know, it's like combining that internal data in a meaningful and accessible way is problem one I've seen and as I'm sure you have some really bizarre looking information Architectures and you know, we know how these things evolved, you know, they evolved over time There are mergers. There were different groups whatever the second side of it is They don't know the tools to kind of really interact with this is more the spotfire tableau side of the world Farmers unique because spotfire in particular grew up in the bioinformatics world and got a big part of its start dealing with the Scientists at a real low assay level. That's not the BI users, you know They haven't ever and they might know what it is and they might have seen an interaction I get scientific data, but they haven't gotten there I think lastly is then kind of connecting the dots and making meaningful analysis of it and right now the analysts don't understand the information architecture well enough and The information architecture guys you don't know the questions So that's why I feel for relaying why we built this ass offering because I think we can leap in there and help them out Yes So really boils down to a communication issue between business and it which we've you know, we hear across It segments and particularly important when you're talking about data analytics and solving business problems with data So so how does how does the CIO go about kind of initiating? You know, let's take a CIO who understands the power of big data What it can do for their organization. How do they go about initiating that kind of first project? Is it a technical problem? Do they need to identify the the business case first? I mean, how do you really get practically get started and unfortunately? This is our last question. So make it a good answer Okay, I'll do my best So I think number one they've got to have a long view on just like we did about structuring data So they've got to you know move to something which is going to be scalable and long-term still be in the game I you know, I think this is one of my concerns of the dupe and I think since they're very well I get the comment. It's like yeah, it handles volume But it doesn't handle analysis very well and when we've architected considering a dupe It's like yeah, we may put certain data sets there if you know the big chunky ones But we'll deal with the summary reports, you know in our workflow for analysis It's a dupe story powerful But if it's just because another silo that doesn't that's exactly right So I think they want to look to that long-term and then I think in the in the kind of To help their users they need to begin to consolidate and give them tools that actually deliver them answers now You could give them all spot-fire subscriptions tomorrow, but I don't think that that would necessarily get them there I think this one relay. I think it's really exciting It's a sass offering but it has the potential to plug into their internal data So it's almost like they can get the outside world organized by us We give them the structure to answer the questions and the look and feel and then strap in the know-how at Pharma X to that decision You know through an integration with you know an attivio index or another data set that we can plug into So I think you want to don't try and boil the ocean as far as trying to come up with the answer because you don't know What's at those analyst desk and they don't know the problems you're facing So you need to in the meantime focus on you know, how do we kind of connect those dots? And I think there's some offerings here to do that. Brigham, thank you so much said. Thank you. I hope that you'll both join us back here again on a pure insight in the future Or at the Cube, but one of the many events of the Silicon angle TV I'm John MacArthur pure insight moderator here with Jeff Kelly Wikibon's Big Data Analyst joined by Brigham Hyde as a professor at Tuft University and managing director relay technology management It's it's a promising CTO of a TVL. Thank you very much for joining us today Very helpful. We'll have six pure insight research notes up on the Wikibon site in the next Couple of days feel free to jump in edit contribute enhance or provide another perspective On any of the analysis that we do and anything that we write Thanks again John MacArthur pure insight moderator look forward to seeing you all again soon Thank you