 Live from Las Vegas, it's theCUBE! Covering AWS re-invent 2019. Brought to you by Amazon Web Services and Intel, along with its ecosystem partners. Fine, welcome back here to Las Vegas. We are live at AWS re-invent, along with Justin Warren, I'm John Walls. Day one of a jam-packed show. We had great keynotes this morning from Andy Jassy, also representatives from Goldman Sachs, a number of other enterprises on the stage. Right now we're going to talk about data. It's all about data, with Io Tahoe, a couple of the company's representatives. CEO, AJ Pahora, AJ, thanks for being with us. Thank you, John. And Lester Waters is the CISO at Io Tahoe. Lester, good afternoon to you. Thanks for being with us. Yes, thank you for having us. AJ, you brought a football with you there. I see, so you've come prepared for... Sport. Sport, I love it, all right. But this is that your booth, and you're showing here, I assume, and exhibiting. And I know you've got a big offering we're going to talk about a little bit later on. First, tell us about Io Tahoe a little bit to inform our viewers right now who might not be too familiar with the company. Sure, well, our background was sitting with enterprise-scale data issues that were really about the complexity, the amount of data, and different types of data. So, 2014, around when we're in stealth, kind of working on our technology. A lot of the common technologies around them were Apache-based, so Hadoop, large enterprises that we were working with, like GE, Comcast had helped us come out of stealth in 2017, and gave us a great story of solving petabyte-scale data challenges using machine learning. So, that manual overhead that more and more, as we look at AWS services, how do we drive the automation and get the value from data? Automation's got to be the way forward. All right, so let's jump onto that, then, on that notion. You've got this exponential growth in data, obviously, working off the edge, internet of things, all these inputs, right? And we have so much more information at our disposal. Some of it's great, some of it's not. How do we know the difference, especially in this world where this exponential increase has happened, less serving? Just tackle that from a company perspective and identifying, first off, how do we ever figure out what do we have that's valuable? Where do we get the value out of that? And then, how do we make sense of it? How do we put it into practice? Yes, I think not most enterprises have a problem with data sprawl. There's project startup, we get a block of data, and then all of a sudden a new project comes along, they take a copy of that data, there's another instance of it, then there's another instance for another project, and suddenly, these different data sources become authoritative and become production. So now I have three, four, five different instances. Oh, and then there's the three or four that got canceled, and they're still sitting around. And as an information security professional, my challenge is to know where all of those pieces of data are so that I can govern it and make sure that the stuff I don't need is gotten rid of, it deleted. So using the IOTAHO software, I'm able to catalog all of that, I'm able to garner insights into that data using the nine patent pending algorithms that we have to find that, to do intelligent tagging, if you will. So from my perspective, I'm very interested in making sure that I'm adhering to compliance rules. So the really cool thing about the IOTAHO stuff is that we go and tag data, we look at it, we actually tie it to lines of regulation. So you can go CC, you know, CCPA, this bit of text here applies to this. And that's really helpful for me as an information security professional because I'm not necessarily versed on every line of regulation, but when I can go and look at it handily like that, it makes it easier for me to go, oh, okay, that's great. I know how to treat that in terms of control. So that's the important bit for me. So if you don't know where your data is, you can't control it, you can't monitor it, you can't govern it. Yeah, the knowing where stuff is, I'm familiar with a framework that was developed in Telstra back in Australia called the five knows, which is about exactly that, knowing where your data is, what is it, who has access to it. It's like actually being able to catalog the data and like knowing what it is that you have, this is a mammoth task. That's hard enough 12 years ago, but like today with the amount of data that's actually actively being created every single day. So how does your system help CSOs tackle this kind of issue? Maybe Lester, you can start off and then you can tell us a bit more yourself. Yeah, I mean, I'll start off on that. It's pleased to kind of see the feedback from our enterprise customers is as that veracity and volume of data increases, the challenge is definitely there to keep on top of governing that. So continually discovering that new data created, how is it different? How is it adding to the existing data? Using machine learning and the models that we create, whether it's anomaly detection or classifying the data based on certain features in the data, then allows us to tag it, load that in our catalog. So we've discovered it, now we've made it accessible. Now any BI developer, data engineer can search for that data in the catalog and make something from it. So if there were 10 steps in that data mile, we'd definitely solve the first four or five to kind of bring that momentum to getting value from that data. So discovering it, catalog it, tagging the data to make it searchable and then it's free to pick up for whatever use case is out there, whether it's migration, security, compliance, security is a big one for you. And I would also add too for the data scientist, knowing all the assets they have available to them in order to drive those business value insights that are so important these days for companies because a lot of companies compete on very thin margins and having insights into their data and to the way customers can use their data really can make or break a company these days. So that's critical. And as AJ pointed out, being able to automate that through data ops, if you will, and drive those insights automatically is great. Like for example, from an information security standpoint, I want to fingerprint my data and I want to feed it into a DLP system. And so that I can really sort of keep an eye out if this data is actually going out and it really is my data versus a standard regex kind of matching which isn't the best technique. So walk us through that in a bit more detail. So you mentioned tagging, especially that a couple of times. So let's go into the details a little bit about what that actually means for customers. My understanding is that you're looking for things like a social security number that could be sitting somewhere in this data. So finding out where are all these social security numbers that I may not be aware of and that it could be being shared with someone who shouldn't have access to that. But is that what it is or are there other kinds of data that you're able to tag that traditional approaches wouldn't actually help us with? Straight out the box, you've got your PII, your personally identifiable information, that kind of data that is covered under CCPA, GDPR. So there are those standard regulatory driven definitions that your social security number name address would fall under. Beyond that, then in a large enterprise, you've got clever data scientists, data engineers who through the nature of their work can combine sets of data that could include work patterns, IDs, lots of activity, bring that together and that suddenly becomes under that umbrella sensitive. So being able to tag and classify data under those regulatory policies, but then is what could be an operational risk to an organization, whether it's a bank, insurance, utility, healthcare in particular. If you work in all those verticals, are you? Yeah, of course, so we're agnostic to any vertical. Okay, all right. And the nature of being able to do that is having that machine learning set up a baseline of around what is sensitive and then honing that to what is particular to that organization. So lots of people will use, as we've seen here at AWS S3, Aurora Postgres, Aurora MySQL, Redshift, in lots of different ways. The underlying sources of that data, whether it's a CRM system, a IoT, all of those sources have got nuances that makes every enterprise data landscape just slightly different. So trying to make a rules-based one-size-fits-all approach is going to be limiting. That then creates your manual overhead. So customers like GE, Comcast, they move way beyond throwing people at the problem. That's no longer possible. So being smart about how to approach this, classifying the data, using features in the data, creating that metadata as an asset just as a data warehouse would be allows you to enable the rest of the organization. So I mean, you've talked about, you know, the deriving value and identifying value. How does ultimately, with your catalog, your tag, what does this mean to the bottom line? In terms of ROI, how does AWS play into that? You know, why am I, as a company, you know, what value am I getting out of your abilities with AWS and then having that kind of capability? Yeah, we did a great study with Forrester. They calculated the ROI and it's a mixture of things. It's that manual personnel overhead who are locked into that pretty unpleasant, low productivity role of wrangling with data for want of a better word, to make something of it. They'd much rather be creating the dashboards of the BI or the insights. So moving, you know, dozens of people from that back office, manual wrangling into what's going to make difference to your chief marketing officer and your CFO, bring down the cost to serve to your customer by getting those operational insights is how they want to get to working with that data. So that automation to take out that manual overhead of that upfront task isn't allowing that resource to be better deployed onto the more interesting productive work. So that's one part of the ROI. The other is with AWS, what we've found here engaging with the AWS ecosystem is just that speed of migration to AWS. We can take months out of that by cataloging what's on-premise and saying, ha, our data science or our data engineering team want to create products on AWS for their own customers using SageMaker, using Redshift, Athena. But what is the exact data that we need to push into the cloud to use those services? Is it the 20 petabytes that we've accumulated over the last 20 years? That's probably not going to be the case. So tiering the on-prem and cloud base of that data is really helpful to a data officer and an information architect to set themselves up to accelerate that migration to AWS. So for people who've used this kind of system and they've run through the tagging and seen the power of the platform that you've got there, what are some of the things that they're now able to do once they've got this highly quality tag data set? So it's not just tagging too, we also do fuzzy matching. So we can find relationships in the data or even relationships within the data in terms of duplicates. So for example, somebody got married and they're really the same, so now their surname has changed. We can help companies find that, those bits of matching. And I think we had one customer where we saved them about 100,000 a year in mailing costs because they were sending to Mrs. She wasn't there anymore, or she was, but her name wasn't. Her name wasn't, and being able to deduplicate that kind of data really helps with that, helps people save money. And that's kind of the next phase in our journey is moving beyond the tagging and classification is our roadmap working with AWS is very much machine learning driven. So our engineering team, what Derek started about is what's the next model, what's the next problem we can solve with AI machine learning to throw out the large scale data problem. So we'll continually be curating and creating that metadata catalog asset to allow that to be used as a resource to enable the rest of the data landscape. And I think what's interesting about our product is we really have multiple audiences for it. We've got the chief data officer who wants to make sure that we're completely compliant because doesn't want that 4% potential fine. So being able to evidence that they're having due diligence in their data management will go a long way towards if there is a breach because zero days do happen. But if you can evidence that you've really been had a good discipline, then you won't get that fine or hopefully you won't get a big fine. And the second audience is going to be information security professionals who want to secure that perimeter. The third is going to be the data architects who are trying to manage and create new solutions with that data. And the fourth, of course, is the data scientist trying to drive new business value. Right. Well, before we let you all take off, I want to know about an offering that you've launched this week. Apparently with great success and you're pretty excited about just your space alone here, your presence here. Tell us a little bit about that before you take off. Yeah, so we're here also sponsoring a Jam Lounge and everybody's welcome to sign up. It's a number of offerings there to competitively take some challenges, come onto the Jam Lounge, use our products and kind of understand what it means to accelerate that journey onto AWS. What can I do? If I show, what, yeah. Give me an idea about what to do. Shade off the block. You can take some time to discover data, understand what data is or isn't there, find relationships and intuitively through our UI, start exploring that and joining the dots on what is my data, knowing your data and then creating policies to drive that data into use. Good. And maybe pick up a football along the way. Pick up a football. Sign up. Get a football, yeah. Gentlemen, thanks for being with us. Thank you for having me. Thanks for the time and again, the Jam Lounge. Right? Jam Lounge. Right here. Jam Lounge. Jeff McMorrie, AWS re-invent. We are live and you're watching this right here on theCUBE.