 Live from the wigwam in Phoenix, Arizona. It's theCUBE, covering Data Platforms 2017. Brought to you by Cubull. Hey, welcome back everybody. Jeff Frick here with theCUBE. Welcome back to Data Platforms 2017 at the Historic Wigwam Resort just outside of Phoenix, Arizona. I'm here all day with George Gilbert from Wikibon and we're excited to be joined by our next guest. He's Mick Bass, the CEO of 47 Lightning. Mick, welcome. Welcome, thanks for having me, yes. Absolutely, so what is 47 Lightning for people that aren't familiar? Well, you know, every cloud has a silver lining and if you look at the periodic table, 47 is the atomic number for silver. So we're a consulting services company that helps customers build out data platforms and ongoing data processes and data machines in Amazon Web Services. And one of the primary use cases that we help customers with is to establish data lakes in Amazon Web Services to help them answer some of their most valuable business questions. So there's always this question about own versus buy, right, with cloud and Amazon specifically. And with a data lake, the perception, right, that's huge, this giant cost. Clearly there's some benefits that come with putting your data lake in AWS versus having it on-prem. What are some of the things you take customers through and kind of the scenario planning and the value planning? Well, just a couple of the really important aspects. One is this notion of elastic and on-demand pricing. In a cloud-based data lake, you can start out with actually a very small infrastructure footprint that's focused on maybe just one or two business use cases. You can pay only for the data that you need to get your data lake bootstrapped and demonstrate the business benefit from one of those use cases. But then it's very easy to scale that up in a pay-as-you-go kind of a way. The second really important benefit that customers experience in a platform that's built on AWS is the breadth of the tools and capabilities that they can bring to bear for their predictive analytics and descriptive analytics and streaming kinds of data problems. So you need Spark, you can have it. You need Hive, you can have it. You need a high performance close to the metal data warehouse on a cluster database, you can have it. So analysts are really empowered through this approach because they can choose the right tool for the right job and reduce the time to business benefit based on what their business owners are asking them for. You touched on something really interesting, which was, so when a customer is on-prem and let's say is evaluating Cladera, MapR, Hortonworks, there's a finite set of services or software components within that distro. Well, once they're on the cloud, there's a thousand times more that, as you were saying, you could have one of 27 different data warehouse products. You could have many different NoSQL products, some of which are really delivered as services. How does the consideration of the customer's choice change when they go to the cloud? Well, I think that what they find is that it's much more tenable to take an agile iterative process where they're trying to align the outgoing costs of the data lake build to keep that into alignment with the business benefits that come from it. And so if you recognize the need for a particular kind of analytics approach, but you're not going to need that until down the road two or three quarters from now, it's easy to get started with simple use cases and then add those incremental services as the need manifests. One of the things that I mentioned in my talk that I always encourage our customers to keep in mind is that a data lake is more than just a technology construct, it's not just an analysis set of machinery, it's really a business construct. Your data lake has a profit and loss statement and the way that you interact with your business owners to identify the specific value sources that you're going to make pop for your company can be made to align with the cost footprint as you build your data lake out. So I'm curious when you're taking customers through the journey to start kind of thinking of the data lake in AWS, are there any specific kind of application spaces or vertical spaces where you have pretty high confidence that you can secure an early and relatively easy win to help them kind of move down the road? Absolutely. So many of our customers in a very common business need is to enhance the set of information that they have available for a 360 degree view of the customer. In many cases, this information and data, it's available in different parts of the enterprises but it might be siloed and a data lake approach in AWS really helps you to pull it together in an agile fashion based on particular quarter by quarter objectives or capabilities that you're trying to respond to. Another very common example is predictive analytics for things like fraud detection or mechanical failure. So in e-commerce kinds of situations, being able to pull together semi-structured information that might be coming from web servers or logs or like what cookies are associated with this particular user. It's very easy to pull together a fraud-oriented predictive analytic. And then the third area that is very common is internet of things use cases. Many enterprises are augmenting their existing data warehouse with sensor-oriented time series data. And there's really no place in the enterprise for that data currently to land. So are they, when you say they're augmenting the data warehouse, are they putting it in the data warehouse or are they putting it in a sort of adjunct time series database from which they can sort of curate aggregates and things like that to put in the data warehouse? It's very much the latter, right? And the time series data itself may come from multiple different vendors and the input formats in which that information lands can be pretty diverse. And so it's not really a good fit for a typical kind of data warehouse ingest or intake process. So if you were to look at sort of maturity models for the different use cases, where would we be? You know, like IoT, customer 360, fraud, things like that? I think, you know, so many customers have pretty rich fraud analytics capabilities. But some of the pain points that we hear is that it's difficult for them to access the most recent technologies. In some cases, the order management systems that those analytics are running on are quite old. We just finished some work with a customer where literally the order management systems running on a mainframe, even today. And those systems have the ability to accept steer from like a sidecar decision support predictive analytics system. And one of the things that's really cool about the cloud is you could build a custom API just for that fraud analytics use case so that you can inject exactly the right information that makes it super cheap and easy for the ops team that's running that mainframe to consume the fraud improvement decision signal that you're offering. Interesting, and so this is diving in the weeds a little bit, but if you've got an order management system that's decades old and you're going to plug in something that has to meet some stringent performance requirements, how do you sort of test? It's not just the end to end performance once, but you know, for the 99th percentile that someone doesn't get locked out for five minutes while he's trying to finish his shopping cart. Exactly, and I mean, I think this is what is important about the concept of building data machines in the cloud. This is not like a once and done kind of process. You're not building an analytic that like produces a printout that an executive is going to look at and make the decision. You're really creating a process that runs at consumer scale and you're going to apply all of the same kinds of metrics of percentile performance that you would apply in any kind of large scale consumer delivery system. Do you custom build, let's say, a fraud prevention application for each customer or is there a template and then some, I guess some additional capabilities that you'll learn by running through their training data? Well, I think largely there are business by business distinctions in the approach that these customers take to fraud detection. There's also business by business direction and distinction in their current state. So, but what we find is that the commonality is in the kinds of patterns and approaches that you tend to apply. So, we may have extra data about you based on your behavior on the web and your behavior on a mobile app. The particulars of that data might be different for enterprise A versus enterprise B, but this pattern of joining up mobile data plus web data plus maybe phone in call center data, putting those all together to increase the signal that can be made available to a fraud prevention algorithm. That's very common across all enterprises. And so, one of the roles that we play is to set up the platform so that it's really easy to mobilize each of these data sources. So, in many cases it's the customer's data scientists that's saying, I think I know how to do a better job for my business. I just need to be unleashed to be able to access this data. And if I'm blocked, I need a platform where the answer that I get back is, oh, you could have that like second quarter of 2019. Instead you want to say, oh, we can onboard that data in an agile fashion, pay an incremental little bit of money because you've identified a specific benefit that could be made available by having that data. All right, Mick, well, thanks for stopping by. I'm going to send Andy Jassy a note that we found this silver lining for the cloud. So I'm excited for that. If nothing else, that made the trip well worthwhile. So thanks for taking a few minutes. You bet. Thanks so much, guys. All right, Mick Bass, George Gilbert, Jeff Rick. You're watching theCUBE from Data Platforms 2017. We'll be right back after this short break. Thanks for watching.