 Live from Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. Hey, welcome back everyone live here in New York City for theCUBE special presentation of Big Data NYC, here all week with theCUBE in conjunction with Strata Data, the event happening around the corner. I'm John Furrier, the co-host with my co-host today. Jim Kobilius, our next two guests. Dr. Mark Ramsey, Chief Data Officer and Senior Vice President of R&D at GSK, Glasgow Pharma Company, and Bruno Aziz, the CMO at scale, both CUBE alumni's, welcome back. Thanks for having us, sir, having us. So Bruno, I want to start with you because I think Dr. Mark has some great use case I want to dig into and go deep on with Jim, but at scale, give us the update on the company, you guys doing well, what's happening. Obviously, you had the vision of this data layer. We talked a couple of years ago. Yeah, it's working. So tell us, give us the update. A lot of things have happened since we talked last, I think you might have seen some of the news in terms of our growth, 10X growth since we started and mainly driven around the customer use cases. That's why I'm excited to hear from Mark and share his stories with the rest of the audience here. We have a presentation at Strata tomorrow with Vivince. It's a great IoT use case as well. So what we're seeing is the industry is changing in terms of how it's buying BI platforms. In the past, people would buy BI platforms vertically. They'd buy the visualization, they'd buy the semantic, and they'd buy the best of great integration. We're now living the world where there is a multitude of BI tools and the data platforms are not standardized either. And so what we're kind of writing as a trend is this idea of the need for the universal semantic layer. This idea that you can have a unified set of semantics and a dictionary or an anthology that can be shared across all types of business users, business use cases or across any data. That's really the trend that's driving our growth and you'll see it today at this show with the use cases and the customers and of course some of the announcements that we're doing we're announcing a new offer with Clouderon Tableau and so we're really excited about kind of how the space and the partner ecosystem is embracing our system. You guys really have a Switzerland kind of strategy. You've got to play neutral role. Play nicely with everybody because you're in a different abstraction layers that's really more on the data. That's right. The whole value proposition is that you don't want to move your data and you don't want to move your users away from the tools that they already know. But you do want them to be able to take advantage of the data that you store. And this concept of a virtualized layer and your universal semantic layer that enables the use cases to happen faster is a big value proposition for all of them. Dr. Mark Ramsey, I want to get your quick thoughts on this. I'm obviously your customer. So you're not biased. You have prunder pressure every day. Competitive out noise out there is high in this area. You're a chief data officer. You run R&D so you got that 20 miles stare to the future. You got experience running data. Why add scale? There's a lot of other potential solutions out there. What made it attractive for you? Well, it fills a need that we have around really that virtualization. So we can leave the data in the format that it is on the platform and then allow the users to use, like Bruno was mentioning, use a number of standardized tools to access that information. And it also gives us an ability to learn how folks are consuming the data. So they will use a variety of tools. They'll interact with the data. Add scale gives us a great capability to really look under the covers, see how they're using the data. And if we need to physicalize some of that to make easier access in the long term, it gives us an ability to do that. It's really an agility model, kind of the data. You're kind of bringing agile. Yeah, it's kind of a way to make, so if you're using a dashboarding tool, it allows you to interact with the data. And then as you see how folks are actually consuming the information, then you can physicalize it and make that readily available. It gives you that agile cycles to go through. In your use of the solution, what have you seen in terms of usage patterns? What are your users using at scale for? Have you been surprised by how they're using it? And where do you plan to go in terms of the use cases you're addressing going forward with this technology? Yeah, well, this technology allows us to give the users the ability to query the data. So for example, we use standardized ontologies in several of the areas. And standardized ontologies are great because the data is in one format. However, that's not necessarily how the business would like to look at the data. And so it gives us an ability to make the data appear like the way the users would like to consume the information. And then we understand which parts of the model they're actually flexing, and then we can make the decision to physicalize that. Because again, it's a great technology, but virtualization, there is a cost because the machines have to create the illusion of the data being in a certain way. If you know it's something that's going to be used day in and day out, then you can move it to a physicalized version. Is there a specific threshold when you're looking at the metrics of usage when you know that particular data or particular views need to be physicalized? What is that threshold or what are those criteria? I think it normally is a combination of the number of connections that you have. So the joins of the data across the number of repositories of data, and that balanced with the volume of data. So if you're dealing with thousands of rows versus billions of rows, then that can lead you to make that decision faster. There isn't a defined metric that says, well, if you have this number of rows and this many columns and this size, that it really will lead you down that path. But the nice thing is you can experiment. And so it does give you that ability to sort of prototype and see our folks consuming the data before you invoke the energy to make it physical. You know, federated, I can use the word federated, but semantic virtualization layers, clearly I've been around for quite some time. A lot of solution providers offer them, a lot of customers have used them for disparate use cases. One of the wraps traditionally against semantic virtualization is that it's simply sort of a stop gap between chaos on the one end, or you have dozens upon dozens of databases with no unified roll up. That's a stop gap on the way to full centralization or migration to a big data hub. Do you see semantic virtualization as being sort of your target architecture for your operational BI and so forth? Or do you, on some level, is it simply, like I said, a stop gap or a transitional approach on the way to some more centralized environment? Yeah, I think, I think you're talking about kind of two different scenarios here. One is in federated, I would agree, when folks attempted to use that to bring disparate data sources together to make it look like it was consolidated, and they happened to be on different platforms, that was definitely a stop gap on a journey to really addressing the problem. Thing that's a little different here is we're talking about this running on a standardized platform, so it's not platform disparate, it's on the platform, the data's being accessed on the platform. It really gives us that flexibility to allow the consumer of the data to have a variety of views of the data without actually physicalizing each of them. So I don't know that it's on a journey, because we're never going to get to where we're going to make the data look as so many different ways, but it's very different than 10, 15 years ago when folks were trying to solve disparate data sources using federation. Would it be fair to characterize what you do as agile visualization of the data on a data lake platform? Is that what it essentially is about? Yeah, it certainly enables that. In our particular case, we use the data lake as the foundation, and then we actually curate the data into standardized ontologies, and then really the consumer access layer is where we're applying virtualization. I mean, in the creation of the environment that we have, we've integrated about a dozen different technologies, so one of the things we're focused on is trying to create an ecosystem, and AtScale is one of the components of that. It gives us flexibility so that we don't have to physicalize. Well, you don't have to stand up any cost, so you have the flexibility with AtScale, if I get this right, you get the data, and people can play with it without actually provisioning. Right. It's like, okay, save some cash, but then all of a sudden you double down on winners that come in. Things that are a winner, you check the box, you physicalize it, you provide that access. You get crowd sourcing benefits like going on, and you're crowd sourcing, but you're employing source. The curation, you mentioned the work. So the curation goes on inside of AtScale, or are you using a different tool or something you hand wrote in-house to do that? Essentially, it's a data governance, and data cleansing, and solid, or harmonization. Yeah, and that is, we used a technology called Tamer, that is a machine learning-based data curation tool. That's one of our fundamental tools for curation. So one of the things in the life sciences industry is you tend to have several data sources that are slightly aligned, but they're actually different, and so that machine learning is an excellent application. Well, it brings up a good point. Let's get into the portfolio. Obviously, as a CDO, you've got to build a holistic view. You have a tool chest of tools and platform. How do you look at the big picture? AtScale fits in beautifully, makes a lot of sense. So a good plug for those guys, but the big picture is you got to have a variety of things in your arsenal. How do you architect that tool shed, or your platform tool shed? Well, again, you- Is everything a hammer or a tool shed? No. You got all the things to build out of that tool shed. I mean, you bring up a great point, because unfortunately, a lot of times a tool shed, we'll use your analogy, it's okay, a tool shed. So you don't want 12 lawnmowers in your tool shed. So one of the challenges is that a lot of the folks in this ecosystem, they start with one area of focus and then they try to grow into other area of focuses, which means that suddenly, everybody starts to be a lawnmower, because they think that's- They start as a hammer, turn into a lawnmower. You got a lawnmower out of that hammer. That's cultivating. You can mow your lawn with a hammer, but so it's really that portfolio of tools that all together get the job done. So certainly there's a data acquisition component, there's the curation component, there's visualization, machine learning, there's the foundational layer of the environment. So all of those things, our approach has been to select the kind of the best in class tools around that and then work together and Bruno and the team at AtScale have been part of this. We actually have had partner summits of how do we bring that ecosystem together? Is your stuff mostly on-prem? Obviously a lot of pharma, IP there, so that you guys have to gain the Pelopatin thing, which is well documented. You don't want to open up the kimono and start the clock until it's releasing. So you obviously got to keep things confidential. Mix of cloud on-prem, is it 100% on-prem? Is there some bursting to the cloud? Is there a private cloud? How do you guys look at the cloud piece? Yeah, a majority of what we're doing is on-prem. The profile for us is that we persist the data, so it's not, in some cases, when we're doing some of the more advanced analytics, we burst to the cloud for additional processors, but the model of persisting the data means that it's much more economical to have on-prem instance of what we're doing. But it is a combination, but the majority of what we're doing is on-prem. So when you hold on, Jim, I have a question about the vendor topic. I mean, actually, everyone's knocking on your door, I get in that account, they spend a lot of money, but you're pretty disciplined. It sounds like you've got a good view. You don't want people to come in and turn into something that you don't want them to be, but you also run R&D, so you kind of have to understand the headroom. How do you look at the headroom of what you need down the road in terms of how you interface with the suppliers that knock on your door, whether it's at scale currently working with you now, and then people just trying to get in there and sell you a hammer or a lawn mower, whatever they have, they're going to try to, you know, you're dealing with the vendor pressure. Right. Well, a lot of that is around what problem we're trying to solve, and we drive all of that based on the use cases and the value to the business. I mean, and so if we identify gaps that we need to address, some of those are more specific to life sciences types of challenges, where they're very specific types of tools that the population of partners is quite small, and other things, you know, we're building an actual production, operational environment, we're not building a proof of concept, so security is extremely important, we're Kerberos enabled in to end, at rest, in flight, which means it breaks some of the tools, and so there's criteria of things that need to be in place in order to... So you've got to think about scale big time. Scale. So not just putting a beach head together, but foundationally building out a platform, having tools that fit general purpose and also specialty with scales, is a big thing, right? Right, right, well that's also, we're addressing what we see as three different cohorts of consumers of the data, one is more in the guided analytics, the more traditional dashboards, reports, one is in more of computational notebooks, more of the scientific using R, Python, other languages. The third is more kind of almost at the bare metal level, machine learning, TensorFlow, a number of tools that people directly interact, people don't necessarily fit nicely into those three cohorts, so we're also seeing that there's a blend, and that's something that we're also... That's a fourth cohort. Yeah, well, someone's using a computational notebook, but they want to draw upon a dashboard graphic and then they want to run a predefined TensorFlow and pull all that together. And what you just said, Tita, the question I was going to ask, and so that's perfect, so one of my core focuses is on deep learning on AI, so in a semantic data virtualization in a life sciences, pharma context, you have undoubtedly a lot of image data, of visual data, so in terms of curating that and enabling virtualized access, to what extent are you using deep learning, TensorFlow, convolutional neural networks to be able to surface up the visual patterns that can conceivably be searched using a variety of techniques? Is that a part of your overall implementation of AppScale for your particular use cases? Currently or do you plan to go there in terms of TensorFlow more in line? No, I mean, we're active, very active in deep learning, artificial intelligence, machine learning. Again, it depends on which problem you're trying to solve. And so, again, there's a number of components that come together when you're looking at the image analytics versus using data to drive out certain decisions, but we're active in all of those areas. I mean, our ultimate goal is to transform the way that R&D is done within a pharmaceutical company to accelerate the, right now it takes somewhere between five and 15 years to develop a new medicine. The goal is to really to do a lot more analytics to shorten that time significantly, helps the patients, gets the medicines to market faster. That's your end game. You've got to create architecture to enable the data to add value to the business. Dr. Mark Ramsey, thanks so much for sharing the insight from your environment. Bruno, you got something there to show us. What do you got there? I do have, you know, I like my props here. He always brings a prop on, so he's got the prop. A few years ago, I think I was, I had a tattoo on my neck or something like this, but I'm happy that I brought this because you could see how big Mark's vision is. And I think, you know, for a lot of the CDOs listening, you know, the reason why he's getting recognized by Cloud there and the data awards and so forth is because he's got a huge vision and it's a great opportunity for a lot of CDOs out there. I think the average CEO spent $100 million to deploy big data solutions over the last five years, but they're not able to consume all the data they produce. I think in your case, you consume about 100% of the instructor data and the average in the space is we are able to consume about 1% of the data. And this is essentially the analogy today that you're dealing with if you're in the enterprise. We spent a lot of time putting data in large systems and so forth, but the tool said that we give the chief data officers and their team is a cocktail straw like this in order to drink out of it. That's a data lake, actually, right? There's a data lake. It's a natural lake. It's a slurpy cup. Multiple slurpees with the same straw. Okay, hold on. I hope it's not Hudson River water here. I can't answer that question. I think I'd have to break a few things if I did, but the idea here is that it's not very satisfying and that's the frustration business users have and business units have. What Axial's done is we build this, right? This is the straw that you want. So I would kind of help CDOs contemplate this idea of the slurpy and the cocktail straw. How much money are you spending here and how much money are you spending there? Because the speed at which you can get the insights of the business user is what's going to make you that straw. You got to break it down so it's available everywhere. So I think that's a great innovation. And it makes me thirsty. Bruno, I think. You know what? You can have it. Bruno, thanks for coming on theCUBE at scale. Dr. Mark Rams, good to see you again. Great to have you come back again anytime. Love to have chief data officers on. Really a pioneering position. It is the critical position in all organizations. It will be in the future and will continue to be. Thanks for sharing your insights. It's theCUBE, more live coverage after the short break.