 From Orlando, Florida, extracting a signal from the noise. It's theCUBE, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Welcome back to Orlando, everybody, this is theCUBE and we're here live at Pentaho World 2015. This is the second Pentaho World, about 550 customers here, Matt Cardillo's here, he's with FINRA. FINRA is a really interesting regulatory agency. Matt, welcome to theCUBE. Thank you, David. Thanks for having me. So you guys analyze 75 billion events daily, which is just a mind-boggling number, but let's start with sort of FINRA, what's it all about and what's your role? Those are exceptional days. One of the key things about FINRA is our data, it fluctuates dramatically. Sure, the base volume obviously. Right, so the number you just quoted would be a high mark, which could be about three times the average. Okay, but still, many, many tens of billions per day, even on an average day. All right, so talk about FINRA, what your role is. You were telling us off camera sort of how it came about. You've been there for quite a while, around eight years now, which kind of coincided with the whole Hadoop movement, so we want to unpack some of that, but what's your role and tell us about FINRA? Sure, I'm a software engineer by trade and I've been doing software for most of my career. I came to FINRA in 2007 and I've worked on a number of systems. I deliver software to help analysts, examiners, regulatory analysts to do their jobs and in terms of analyzing firm data and help provide that qualitative analysis on top of our surveillance program. So FINRA is a regulator, it's the Financial Regulatory Authority and our mission is to ensure that the markets are fair. So we do that in market regulation, it is a big data problem, it always has been and the markets are changing and we're changing, we need to respond and change with it. So you're making sure people are playing by the rules, not gaming the system, not cheating, not doing things that you don't want them to do. Yeah, that's a data problem, but to solve that problem back in the early 2000s, you had to sort of shove all the data into a big box and maybe buy a Unix server and buy some Oracle licenses and then if you had any money left over, you could maybe pay some people to analyze it. Is that how you did it before and how did sort of Hadoop and Big Data change that? So where we've really shifted is, FINRA has its own data center and we're moving toward the cloud, we want to have the elasticity, we were at capacity and so we needed to change our approach so that we could have that elasticity in terms of responding to changes in volume data and the market data and also changes with user behavior. With analytics, it's very much a, what I call a burst utilization problem where on a given day, we can have spikes in usage. You know, when the markets are getting volatile, we see a lot more users in the system so we get that fluctuation. You mentioned there's sort of the oversight or surveillance which I assume is kind of the insider trading Ivan Boski role which is when a stock spikes before a deal's announced, you got to find out why. There was a second role or mission in there, what was that? So well, we look at ensuring that the markets are fair by looking at, we analyze market data but we have, we do that with our surveillance program where we've got essentially algorithms that are combing through the data and looking for things that are out of range of what we would consider normal and we have exceptions and alerts that kick out. It's the analytics on top of that where users will go in and then either to help confirm that there is a problem and we have to go deeper or it is a false positive thing. So there's two things there. Well, one are, or a couple of things. Are you dealing with streaming data? You know, is close to real time as possible or can this be done overnight because you're not going to catch the guy in real time and send out the minority report guys to snatch the perpetrator before he's actually done the trade? So is it unearthing like fraud and then after the fact, sort of investigating it? And so what type of analytics unearths it and then what allows the investigator to go look at it? Yeah, so primarily it is a look back. However, things are moving more toward a near real time or real time scenario. But not in terms of FINRA's mission. I mean we do want to be able to respond as quickly as we can. A lot of our surveillance programs they look back at prior month, prior quarter, those kinds of things. It is looking at what happened and then unearthing why something took place. Yeah, because you guys have to be careful about false positives, as you said. Absolutely. And it is a look back but we heard from your boss you were dramatically compressing the time it takes to look back. I mean it used to be, I don't know, I think I heard 90 days or something, a fairly long time down to 90 seconds or minutes or days or I'm not sure there's a big range there. So we have hundreds of surveillance programs running. Some of them are designed to look at a quarter's worth of data or maybe just a month's worth of data. There's many different rules that we have to enforce and monitor for. So it's definitely a combination but in terms of how we're bringing the data in and making that available, that's really where we're compressing things. To bring the data to our end users sooner so that they can see it in some cases next day. So let's talk about that a little bit. I've said many times, the decision support, data warehouse, BI world, historically has been insights for a few. Very high value people analyzing stuff and maybe eventually it'll get acted upon. That's changing. Let's talk about how you do that. If I understand it, you've got data sources coming in and then you use Pentaho to blend that data and then you use Amazon, S3, EMR, Redshift, all these services to act upon that data and then you put it in the hands of your users. That's right. And so it's citizen analytics, I call it. So first of all, is that the right workflow? Did I describe that right? And I'm sure you can add a lot of color to that if you would. Sure, so yeah, a lot of what we're doing now is we need that elasticity. So we're doing that by embracing cloud computing, Amazon in particular, using their S3 file store. We're using transient clusters that we can actually provision on the fly and bring up clusters to do our ETL on the data in terms of processing the data. We can bring on query clusters for our users so that we can respond to demands for running analytics against the data in S3. Pentaho fits into the picture in terms of, they had the ability to scale with our usage scenario and our data footprint. We're able to actually pull back, we're able to put tremendous amounts of data in front of a user and then let them actually pivot and summarize on that data to illuminate what's going on very quickly so that it leads to those next questions. Because a lot of times, when someone's looking for a problem, depending on what's come about, you don't always know exactly what you're looking for and giving that power to the users to be able to go in, ask questions of the data so that it can further direct or hone where they want to look is very enlightening, it's very enabling. And is it correct that the systems that you're using to analyze that data comprise both sort of these modern tools like Pentaho and Amazon, as well as traditional data warehouse technology? Is that correct or no, have you sort of transitioned out of that? I mean, where we want to go is we want to be able to, we want to be as elastic as possible and we want to be able to have limitless scale and do that on commodity hardware as much as we can. Okay, so a bunch of heterogeneous stovepipes of data doesn't scale for you is what I'm hearing. And so you want to drive as much of that into a cloud-like service orientation as possible. That's right. On a scale of zero to 100%, how far are you in that journey? Are you more than halfway or are you? Yeah, we're definitely more than halfway. Really? Yeah. And okay, the other thing I wanted to ask you about is complexity. So as a software engineer, you need deep in complexity. And we've seen a lot of the practitioners in the Wikibon community talk about the challenges of working with big data technologies, generally Hadoop specifically. Every day there's some new project coming out and some new open source capability and with a funny name. How have you dealt with that complexity? You just throw bodies at it, you got to find good people. You know, the technology is one thing and then you got the people in the processes. The other really hard thing. It's challenging and it really comes down to the people. People who are innovative and can embrace new technology and one of the things that going into this migration to the cloud, we knew that some of the tools and technologies that we're leveraging today may not be the tools that we'll be using you know, on the other side of this. So one thing that we're very big on is open source. It is a huge enabler for us in terms of being able to realize the elasticity I keep talking about. You know, things like Hadoop and Hive, Spark. So there's definitely followings in these technologies. So when you talk to somebody in technology, they tend to gravitate toward a specific thing like Spark or Hive, querying and these kinds of things. So they tend to go really deep in it and those are the people we're trying to attract. But at the same time, you, if I infer correctly, have to assume that some of those tools are disposable. Maybe you assume all the tools are disposable. You have to create a platform that can be agile and that you can respond to if MapReduce is going to be replaced by Spark, is going to get replaced by something else, you have to accommodate that. Yeah, we definitely want to have a nimble architecture. We want to be able to adapt so that when, you know, as these things leap, they tend to leapfrog each other. So, you know, what works really well one day becomes obsolete tomorrow because some other technology has leapfrogged it and can produce an order of magnitude more performance, for example. And we need to be able to respond to that. And so we, the way we architect our solutions, we want it to be, you know, very pluggable. How did you guys get so smart? Is it just brute force hiring really good people? Do you sort of have relationships with the hyperscale guys like Google and Amazon where you share information? I wonder if you could talk about that a little bit. So I would say that, you know, we're constantly learning. It just, it never stops. We're, you know, and I wouldn't say, you know, when we're trying to attract talent, we're looking for people who are, you know, ambitious, who are unafraid to try things and really get into, you know, get into the problem space. So, you know, we also, we're also looking at universities and, you know, we want people who can really change the playing field in terms of bringing technology to life for our users. And what's your background? I mean, just, you know, from a technical standpoint, where'd you come up through the ranks? So I'm actually an electrical and computer engineer by trade and so I graduated at a time where there weren't a lot of jobs, but there were a lot of software jobs. So I gave that a try and never really looked back. So what about the kind of roadmap for, you know, FINRA generally, but specifically around how organizations can and should be and will in your view, and maybe this is your opinion, not necessarily FINRA's, but will be operationalizing analytics. You know, Jeff Hammabocker from Facebook and Cloudera, very famous of saying, the best minds of my generation are applying their skills to get people to click on ads. And that's changed, you know, you're seeing big data in healthcare, you guys obviously and things like fraud and analytics is changing the world. What's the roadmap, you know, the vision for how analytics will become operationalized and affect change? Well, I think as, you know, as we make this move to the cloud, you know, and complete that, I think we will be a lot more agile in terms of responding to changes in the market. We'll be able to adapt to the technology as it, you know, continues to innovate and, you know, right now we're in the explosion phase of big data, it seems. And so, in terms of analytics and what I see the future look like is, you know, so we have our surveillance programs that kick out these alerts and exceptions and things and we want to, you know, we want to make it as seamless as possible. So from initial detection to be able to, you know, perform the analytics on top of those alerts and exceptions and kind of pull the thread, we want that to be as seamless as possible for our users, we want to enable them so that gone will be the days where it's, you know, a bunch of intermediary technologists that they have to call, reach out to, to do a bunch of stuff, come back to them. You know, we want to enable them as much as we possibly can so that they can answer their own questions. How soon we'll get there? Well, but so you're describing a, we know there's going to be more data, but you're describing a scenario where that escalation of the volume and variety of data doesn't negatively impact your elapsed time to insights and action. You actually, you're saying you're going to accelerate that despite the volumes of data increase. And that's kind of the vision that you're putting forth. Yep. What about security? So a lot of people in financial services, you know, security is kind of a bad, a cloud is a bad word in a security context. You guys have embraced the cloud generally, AWS, you know, specifically, as a practitioner. I mean, what's your take on security in the cloud? So we believe that the cloud computing technologies are secure and we believe what we're implementing, you know, we have, security is implemented in a number of ways so that the data is protected. And you know, it's not, we don't really see any challenges around, you know, the virtual private cloud being insecure. I mean, I've always said on balance from the vast majority of organizations what Amazon and Google are going to deliver is more secure than what you can. Now it's maybe different in financial services. There's some, some hardcore people that understand these problems, but it's interesting your perspective's there. All right, Matt, we're out of time, but I'll give you the last word. Pentaho world, you know, what do you make of this event? You know, why are you here? What are you learning? So this is great. This is my first time at Pentaho world. And it's pretty interesting seeing, you know, how much of a following there is in analytics and some of the innovations that are going on. It's pretty encouraging to see, you know, all the solutions and some of the great minds here today. I'll actually be presenting in a breakout a little later on today where we're going to demonstrate some of our capability. You know, in one of our test environments, I'm going to show a little use case and kind of show how we can at scale, you know, look at, you know, how to navigate some of the analytic problems that, you know, challenges that we're dealing with at Fender. Matt Cardillo from Fender. Thanks very much for coming on theCUBE. Really appreciate it. Really interesting use case. Somebody actually putting all these insights into action. This is theCUBE. We're here at live at Pentaho world 2015 in Orlando. We'll be right back.