 Good morning, Pentaho World. How are we doing? Are we good? Are we good? Are we really good? Really good. Let me see. Do we have any rock stars in the house? Let me do a quick check here. We have Matt Castors, the father of Kettle. Matt, can you go ahead and stand up? Yeah. Big part of our community heritage, Kettle, became PDI, open source, very important to our future. Mark Hall, you in the house? Weka? There he is, Mark Hall. Weka, predictive analytics. Very, very important to go in. So we've got Matt Castors is in the house. We can begin the session. So if you think about last year's Pentaho World, for those of us that joined us, it was all about data, right? We talked about data for breakfast, data for lunch, data for dinner, 24 by 7 data. Well, folks, that hasn't changed. The digital universe is exploding. Data is doubling every year. The unstructured component of data is doubling every three months. And enterprises view that as very valuable. 77% view that unstructured data is really important to the organization. Now, if we kind of peel back the big data onion, we see that the World Economic Congress has forecasted that there will be 5 billion people connected to the Internet by 2020. 5 billion people and 50 billion things connected to the Internet by 2020. Cisco estimates that the IoT data generation, Internet of Things data generation is going to exceed 400 zettabytes. Now, how do I go look at what a zettabyte is? Anybody know what a zettabyte is? Well, I do now. Yeah, Matt Castors does, of course. So I had to go look it up. If you were to take every spoken word in the human language over time, were able to put that in a database? I think Mike Olson will be able to tell you that you can put it in a cloud era. But if you were to put that in a database, that would be 42 zettabytes. Cisco estimates 400 zettabytes by 2020 coming from the Internet of Things. But data capture and storage isn't everything. It's important, but it's not everything. So now we're kind of thinking about what are the people implications of all this? And Ventana Research did a survey of 250 IT senior executives, 60% coming from large enterprises. And they asked those IT executives, what are the important skills for your team for the future? What are the skills that allow your company to go forward? And that survey results show that analytics was the number one skill set, right? Analytics is all about making companies smarter. It's all about empowering business executives. It's all about allowing companies to be fast and mobile, but mobile with information. So now for a graceful segue into the movie The Graduate. You're probably going, is there a graceful segue to the movie Graduate? So how many of us saw the movie Graduate, right? Very good. Remember that iconic scene when young Ben comes home from college and he's sitting by the pool and a family friend comes up and Ben does not sure what he wants to do? And he said, you know, Ben, the future is plastics. Remember that iconic moment? Well, sadly, I had one of those moments. So I've got a 20-year-old son, a junior in college. His name is Liam. So he came home from the summer. We thought we'd do a little father bonding experience. So he took him to a San Francisco giant game. As you know, the Giants didn't make it to the playoffs. They were losing the game, I assume. So we had a moment really talking about what does a 20-year-old do, right? He's thinking, computer science, business, dad. Is it computer science or is it business? And clearly, I wasn't very self-aware. So I turned to him and I looked him straight in the eye. I put my hand on the shoulder and I said, son, analytics. Say no more, analytics, just think about it. So he looked at me and he goes, dad, you okay? And I did, then I became self-aware. And I caught myself and I said, I needed to walk around a little bit. And as I walk around, I said, note to self, stop taking parenting advice from Oscar-nominated movies. But you sort of get the point, right? Analytics is very, very important for the future. And so as we look at this ever-expanding universe of data, data, things, and people, we look for where's the return on investment, right? How do companies make business, right? And where they do is really the intersection of those things, right? It's that the intersection is where companies grow revenues faster. It's that the intersection where companies reduce costs. They drive more operating efficiencies. It's at that intersection where key initiatives like Internet of Things, 360 View of the Customer, Better Supply Chain Management, Better Fraud Control Live, all those pieces need to come together. So the other dynamic, we're all technologists here. We're all building infrastructure. We're all trying to make our companies more productive, more competitive, more innovative. So a couple of the things that we're seeing coming out that I think we need to wrestle. We've got to get our arms around. And that is that the Internet of Things and the cloud are ready for prime time. McKinsey Associates, right, the global research firm, they've estimated that the economic value from the Internet of Things by 2025 is $6 trillion. Yes, $6 trillion. If you break that down by use case, predictive maintenance as a use case for manufacturing and sensors and operating controls, a $2 trillion economic opportunity. Smart homes and security, a $300 billion economic opportunity. Smart cities, a trillion dollar economic opportunity. Smart offices and energy efficiency, a $70 billion economic opportunity. It's real. So the other dynamic that we're seeing just in the past year, year and a half is that big data is getting cloudy. So what do we mean by that? We're starting to see a convergence between big data and the cloud. In fact, yesterday we had a strategic advisory board session with some of Pentaho's key customers that are innovators, thought leaders, shaping our future on the edge, doing really exciting things. And as they talked about their big data deployments, almost every single member of our strategic advisory board talked about the cloud as a very critical deployment model for big data. IDC estimates that the cloud market is going to be worth $130 billion by 2025 and that 30% of all enterprise software will be delivered in a SaaS model. It's something we have to get our arms around. And so in doing that, what we're seeing is in order to enable these next generation big data applications, the technology has to change. We have to see a shift. You particularly focus in on data movement. The data movement in the traditional legacy world was a fairly batch process, right? Now we're really seeing that the data pipeline needs to be more real-time to enable these applications. And Classic BI, which added a lot of value where an analyst sits outside the application and looks at analytical tools, that has to change. Really seeing analytics is really being embedded, being inside the application, being consumed at the point of impact. So we see these technology shifts happening and needing to happen to really, again, enable putting big data to work, to enable 360 view of the customer revenue optimization opportunities, to enable Internet of Things opportunities, to enable security compliance, supply chain management kinds of solutions. In addition to real-time streaming and embedded analytics, blending is really important. We can't forget about the relational world because a lot of these big data use cases that we're involved with requires blending of unstructured and structured data. Doing that in a more efficient fashion is very, very critical. Governance. Right now that big data is becoming mainstream, prime time, production applications, governance has to be part of the wrapper. Security, data lineage, auditability, traceability, very, very critical to making these big data applications successful. And then predictive analytics. That's the aha in terms of next generation of analytics, is the ability to take these large data sets and predict something, predict a propensity of a device to fail, of a customer to buy and offer, all very, very critical components, and a shift in technology direction. The other interesting impact that we're seeing is the big data lake is quickly coming of age. We've been working with customers, trying to bring them think about the data lake and work with the data lake. In fact, one of our founders, James Dixon, coined the term data lake over five years ago. And is he in the room, Dr. Jimmy? He's probably still sleeping. He really doesn't wake up till about noon. So we'll catch him on the back end. But the big data lake is coming of age and so I was trying to think of, since we've been doing it for five years, what's a good metaphor in describing the evolution of the big data lake? So one is I kind of thought about this awkward teenager. Right, the big data lake. So what do you mean by an awkward teenager? Well, you know you're on the cusp of something great. You know that being different is better. You know that being different is special, but you're not quite sure what the future holds. So how do I translate that into technology architectures? So what we've seen over the last couple years is that many companies have looked at big data opportunities as tactical projects. And we think that's a good thing. There's a line of business that's involved that's helping drive the initiative. And then the IT organization has to come in and build the infrastructure for the project. And we're starting to see a lot of reference architectures emerge where IT needs to find an easy way to get unstructured data into the data refinery. Could be Hadoop. And then blend data with the relational world. Do that in a streamless kind of way. And then based on the analytical workloads, perhaps move that data into a higher performance analytical database and then deliver an analytical data set to the line of business user at the end of the data pipeline. We have about 300 production deployments that are using this kind of reference architecture. And on the continuum scale, the evolution scale, this is a really good place for companies to start. But we've seen a lot of forward thinking over the last year. And if I take that awkward teenager metaphor, we're now seeing the data lake as really the bearded hipster. In a sense that little more mature, little more business oriented, very focused on things like enterprise hardening, security, and governance, and clearly aligned arm in arm with the business unit. So how does that translate into next generation architectures? So we're starting to see companies look at the big data lake as a strategic corporate asset. In a sense that they want to make sure it's very easy to get all that unstructured data under management, find a place to easily blend that data in the refinery, in the data lake, and then enable the business units. Make it easier for the line of business to dip into the data lake, take the data they need, blend the data that they need. That data may sit in a relational database behind the company's firewall. That data to be blended may be in the cloud. And that data to be blended may be virtualized. In a sense we're calling that really on-demand or dynamic blending. IT needs to be able to provide an infrastructure so it's easy for the line of business to access the data they need. And then to run workloads based on the latency. It could be batch, it could be interactive, it could be real-time, it could be predictive. And in a strategic corporate asset, IT wants to make sure they can take advantage of all the business intelligence tools and analytics tools that are at the disposal of the line of business. Line of business having control, line of business being very, very key. So we're starting to see this reference architecture emerge as the next generation view of the big data lake. So what's needed to make this happen? Well, that's what we're building here at Pentaho. That's what we're focused on. That's what you're going to hear a lot about over the next couple of days. Our role and the value added we can bring is to help our customers operationalize their data lake. And in doing that providing an end-to-end analytical platform that automates the process and serves two masters. Master's number one is IT. It's got to build out the infrastructure, find a way to get all this data under management, but then have corporate governance. Have a way to track, trace, and audit all the information, making sure when blending is getting done, the calculations are the same across the board if you're a bank or if you're in a regulated industry. So that data refinery is very, very important for IT to be able to manage. The other master that we're serving with our strategy is really to allow that line of business to be more self-service and getting access to the lake, to search the lake, to do the blend. So how do we translate that into features? How do we translate that with auto ETL driven by the line of business, not always a manual effort by IT? Auto data modeling, creating measures and metrics and reports on the fly. And then auto discovery in terms of delivering that analytical data set to a tool of choice. It could be the Pentaho BI front-end to view the data. It can be embedded in an application which we see more of day in and day out. It could be used by another BI tool. So operationalizing the data lake is where we think we can add real value in helping our customers be more productive and be more competitive. So the theme of this event is putting big data to work. So who's putting big data to work? Who's making money on big data? So I'd like to share with you some great customer use cases. So people familiar with telematics? So telematics is, it's a system put into cars and generally it's driven by a dongle or a sensor that goes inside the car that really provides machine data or information about the health of that car. How is the car running? How are the components of the car? You can also track driver behavior and driver patterns. So our customer here is IMS. They are a connected car platform. Their customers are insurance companies and fleet management organizations. So I want to talk about the insurance use case because I think it's very, very interesting, very cool. So what IMS does, it enables their insurance companies to provide new business models, new types of insurance, usage-based insurance. Anybody familiar with what usage-based insurance is? I was until a couple of days ago. So what that is, is it's really allowing the insurance companies to provide better discounts to its policy members based on not a theoretical good driving record but an actual good driving record. How well do you drive the car? How safe are you in driving the car? And so you're getting a lot of consumer adoption around this usage-based model because it's all upside for the consumer. If you're looking at your driving record, the pattern, what you get from the operating controls in your car, if you're a good driver, you get lower discounts. If you're not so good a driver, there's no penalty. You're not put in the penalty box. And so we're seeing that as a very, very dynamic way to deliver that. And our partner is doing great things in the insurance industry. So where does Pentaho play into this architecture? So again, we grab data from the sensors, from the cars. We then blend that data with relational data. The customer profile data, claims history kinds of data, put that into an analytical database, and the insurance program manages to consume those analytics and they get to monitor the effectiveness of the program as well as the effectiveness of the policies themselves. Another great IoT example is in the energy industry. Our customer is Opower. Opower is a cloud-based energy efficiency platform. They've generated over $400 million of savings for consumers by collecting smart meter data in terms of energy efficiency and other devices in the home. They then provide a personalized analytics for consumers so consumers can see how much energy are they using, what are ways to reduce the energy, and the aha is that they can benchmark themselves to their neighbors. So if you're in a cul-de-sac, you can see who the energy gluttons are as a way to really drive more efficiency. So Opower has over 95 utilities using the platform and 50 million consumers using the platform. So again, Pintaho's role is grabbing all that data from the smart meters, blending that with the CRM systems at Opower, and then delivering analytics through an application to both the energy utilities as well as to consumers. Really, a great way to not only drive big data applications but make the world a more efficient place to live. Another IoT example, IoT predictive analytics. So Caterpillar has a software division that provides predictive analytics and analytics to the maritime industry, to the shipping industry. And so you're going to hear from Jim, who's the CEO later on this afternoon. And so what Caterpillar does is they provide analytics to cargo ships, container tracking ships. And those analytics have sensors on board that track all the performance and operations of the cargo vessel. They then, or Pintaho comes into play, is we blend that data from the sensors and the onboard operating systems and we blend that data with GPS satellite information so you get positioning with operating control. And so then Caterpillar software then uses our predictive analytics capability and they can predict whether a device is going to fail or a system is going to fail on board. Why is that critical? Because if you're in the middle of the Indian Ocean, there's no garage to take the boat to, right? You want to be able to predict that failure ahead of time. And they've done an amazing job of providing efficiency and operation advantages to their customers. So let me kind of shift out of the IoT use cases and talk about some big, big data in the financial services industry. So our customer here is Finra and you're going to hear from Saman Farh, one of the key executives at Finra in a few minutes and he'll do a much better job than I am in telling a story, but I did want to give you a snapshot of what Finra is doing. Finra is an independent regulatory body that manages and looks for insider trading and fraud in the securities industry in the United States. They capture every trade that's done on all the markets in the United States. They capture 75 billion events every single day. Yes, 75 billion events every single day. And so they're great innovators because they had to innovate in order to do their job, right? So they have to capture all this data and they have to quickly analyze this data to look for fraud, right? To look for trends, to look for bad behavior, bad normalities in the trading environment. And so doing that is their data lake is Hadoop, right? It's a great place to store 75 billion events. But to scale out, this work gets very, very interesting and gets sort of elasticity as they're running an Amazon cloud. So Pentaho is running an Amazon cloud with them. Again, doing the data integration, bits that we do, and then based on the performance workloads, still in the Amazon cloud into Redshift, higher performance database. And then presenting those analytics and applications for the investigators and the analysts to look for fraud and compliance. Yet one of the best big data use cases that's real and quite frankly, I'm not sure they could have pulled this off if they weren't being able to take advantage of this sort of next generation technology. So kind of staying in the financial services realm as well as in the big data in the cloud, NASDAQ, we all heard of NASDAQ, right? The very famous exchange. And so they do a similar kind of thing, is they run a lot of their infrastructure in the cloud for elasticity purposes. Pentaho is running with them. We do all the data integration from the trades that come in. And so what they're trying to do is really help their customers monetize the data, right? Because if you're an exchange, the faster, more reliable information you can give your customers, they can make money on that information. So their job, what we help them do is get that data into an analytical data set, into a form that their members can use as quick as possible in order to be more effective traders in order to make money. But a great big data use case. And just kind of a shout out to both those teams. You know, we spent a lot of time with companies that are building out big data infrastructure. And that team at NASDAQ and FINRA, some of the most talented, innovative, big data leaders we've met. And so we're really lucky to have them as customers because they're helping shape our future and take us where we need to go. I could spend three hours talking about big data use case. There's so much going on. But we've got so many at the event here for the next couple of days. I highly recommend you attend the presentations. You're going to see a lot more big data use cases. Companies putting big data and the power of big data to use. I recommend you go to the exhibition halls and see the booths where, again, big data applications are being deployed. There's some very, very interesting stuff going on. So what about the cool stuff, the next wave of innovation? So this is good news and bad news. So the good news in the big data commercial open source or the open source world, the pace of innovation is dizzying. It's amazing. It's very good in a sense that there's so much innovation being fostered through the open source community. But it's for a mere mortal company. Google, if you're Yahoo, if you're Facebook, if you're Twitter, we've built some of these technologies. It's easier for you to consume them. But if you're a mere mortal company, like a lot of us are in this room, it's very hard to consume these new technologies that are exciting, yet a bit immature. And so part of the role we think is really important for Pentaho is that we want to be that heat shield for you on these next generation technologies. So whether these technologies are spark, is a very exciting, fast and memory processing capability and framework on large data sets. Or whether it's search, things like Solar and Lucene or Kafka, real-time messaging or Docker, in terms of kind of next generation open source application development through container technology. What we do is we take all these technologies because we're part of the big data club, the open source club, and we bring them into Pentaho Labs. And so we use these technologies, try them out and see how they can make our products faster, easier, more effective, more performant, more reliable. We harden them. And we harden them before we roll them out to you. So we think that's a big part of our role in the big data ecosystem. So lastly, I'd like to close with, you know, our commitment to you. So we really take our commitment to you as a mission, as an obligation. And we think about it in several dimensions. So one is we truly believe our job is to help you future-proof your data foundation. Just think of these next generation data architectures emerging. Think of the pace of innovation. Think what's happened. And we recognize the value of that is you can take advantage of this new technology. But also the value of it is you can avoid vendor lock-in. The old vendor lock-in from the relational days where you could only buy your databases and your tools for one or two vendors. Now, we want to be able to be part of the big data fabric and let you swap out in terms of technology. Allow you to interoperate. That's our role in terms of the big data fabric is to future-proof your foundation. And we're going to continue to drive innovation. That's our DNA. That's what we've been about. We're going to continue to drive innovation in the big data ecosystem. You've got that commitment from us. The last point here, quite frankly, is something that we haven't talked a lot about at Pentaho, right? Over the last few years, we've all been about innovation, innovation. New feature, new feature. But now we've really figured out, working with all of you, is we also have to harden our products. We have to make our products enterprise-grade. We take that now as importantly as we do anything else. So what do I mean by enterprise-class platform? Simply, our products have to be easier to deploy. They have to be easier to use. They have to be easier to configure, right? They have to be easier for DevOps to, you know, kind of the non-sexy stuff, but it's really, really important that we get that right. So you hear from Chris Eakin in a few minutes. We've really focused on that over the last year. That's been a big part of our 6.0 delivery. It's a big part of our future. So again, we have to serve two masters there. We're going to continue to innovate. We also want to continue to provide enterprise-class and harden our product. And so now being part of the Hitachi family, they're also helping to sharpen our focus in that area. That is their DNA, the Hitachi data systems. Enterprise scale, enterprise performance, hardening, make sure it works out of the box every time. That's really helpful because we're going to infuse some of that DNA into Pentaho. And then most importantly, our commitment to you is we want to be part of your business success. Our scorecard at the end of the day is what value did we bring to your strategic business initiatives? How did we help your companies grow revenue faster, reduce costs, drive more efficiency, implement these big data strategies? That's the scorecard we want you to measure us on year one, year two, year three, and beyond. So again, I want to thank you personally for coming here. We have over 530 people. We recognize how incredibly busy you all are. We recognize the expense to come here, but I can guarantee you when you leave here after two days, you're going to find this a very rewarding trip and a very good return on your investment. So thank you very much and have a good Pentaho World. Hey, Pentaho World, good morning. Oh, come on. It's past 8.30. We're open for business and that means product. Product's been cooking all year and product is releasing today. So good morning. It's great to be here again. I was standing in the little entrance area this morning super early and the beautiful blue banners were there with the Pentaho logo and there were some screens playing in the back of last year's big event. And if you recall, last year, I spent a lot of time focusing on the strategy and the vision and on day two, I jumped off the stage, gave the roadmap to you and let you pass it around if you recall. This year I want to switch the gear. I want to talk about execution. What have we done with that vision? How have we made it real so that you can benefit and make it real in your lives? So that's the focus today and yes, I will talk a little bit about the roadmap near the final end, but making it real is what this year has been focused on. Quinton mentioned that careful balancing act between innovation and enterprise. The labs providing that heat shield for us to do wonderful innovation, bring it forward when it matters after we've proven that it's worthwhile and of value. And then I'm the heat shield internally, especially this year, that tried to move the big rock in this picture. Because it's easy to do, I shouldn't say it's easy, it's technically hard, but it's easier from an interest to do the bright and shiny things. Look at that new cool innovation. I'll go chase it. Oh, look at this cool innovation. I'll chase it. And those are exciting for people. It's harder to convince an organization that the exciting stuff is called enterprise hardening. We need more administration, more security, more failover, more monitoring. Those are sexy, cool features. That's the big rock. That's what we did this year. We switched that gear to say, look, we got tons of capabilities that I talked about last year, tons of capabilities. But in an enterprise-wide deployment where big data is now real, we have to move the dial into that category of simplicity and enterprise grade. That's the tricky rope to walk. But I'm really pleased that we walked it to this year, we defended it this year, and we're announcing product today that shows that. Quinton showed you a little bit of the architecture at a very, very high level. I'm going to use this version maybe three or four times today just to show you where did we aim the focus of each release. Because you can't do everything across the board every single release. So certain releases had a focus area. The blending of our traditional architectures with the big data architecture, focusing especially above the traditional line, hardening the big data agenda, the embedded analytics, making it more easy. This was the initial focus of the first two point releases this year. 5.3 and 5.4. Now, many of you in the room are developers, you're building solutions for your customers. It's really hard to both innovate quick things and do the hard work, the hard heavy lifting work. So part of Pentaho's team was focused on these point releases which were critical while a second half of the Pentaho team started working on the major release. So in parallel, we had two development streams happening throughout the year. So really pleased first that 5.3 shipped on time with great quality, with great functionality, doing some initial work in the streamlined data refinery, this wonderful blend that says how do I get self-service as a user, but how does IT have the right guardrails? And this isn't about simply self-service like a report where you filter it and change the parameter of report. What I'm talking about here is dynamic ETL. The data is not smooth in one beautiful data warehouse. But how dynamically can we reach into the data lake, blend it with traditional, bring together that data, publish it, model it, and put it in the hands of a user with less and less IT touchpoints. Which means faster to the user. The streamlined data refinery is that secret weapon. It applies in every single vertical, every use case in this room. Okay? So we started that journey in 5.3. We also looked forward and said, you know, customers and yourselves want to see an experience on the front end that's tailored for you. Instead of us shipping out of the box, here's a product and it can do 20 things, but it looks this way. You want to twist it, turn it, add the 25th thing, color it yourself, fit it into your other application. That's embedded analytics. We need to provide the capability for you to take the frontend abilities and put them into your environments and tailor them, customize them. And so Analyzer, all of its APIs started to get exposed. So you could take Analyzer as an example and adjust it and tune it and make it part of your environment. So 5.3. 5.4 came around really quickly thereafter. Continuing that embedded analytics agenda. Doing some initial work now in the real product. We took the labs work in Spark and brought the Spark execution item into the PDI jobs. Made that part of product now. And as Quinton mentioned earlier, we moved into a cloudy space. Streamline data refinery and a lot of our large customers using it today are using the cloud for storage and processing at scale. So we introduced that in 5.4. Now, that meant the other half of the team was working really, really hard on the major release. The major release focus is right in this bullseye. The enterprise grade enterprise hardening of the Pentaho servers where all of that enterprise requirement is met. A massive undertaking. A scary undertaking. Because we also want to be very conscious of the fact that in order for you to take advantage of the value of the things we build it has to upgrade really well. Because you've already invested in Pentaho. You have applications running. So upgrade and migration became a paramount tenet even though we were taking on some big effort and big features it had to upgrade. So version 6 focuses on that hotspot. We're not launching version 6 today. It's available today. These are not just nice words. You can start going and downloading and using version 6 today. So the team is very proud to have the features for you this afternoon. Well, this morning still technically. There's two big themes. Making that big blend of reality and making sure those right controls and right governance are in place to enable that flow of data. That flow of data takes lots of shapes and forms. So we're going to talk about that today. So if I drill down on each of those two themes really quickly the first is these data flows. So a data flow think of it as, you know, data doesn't work all the time. It takes different shapes. It could be data arriving into a warehouse. It could be taking the data out of a warehouse and putting it into a mart making a cube, making a personal slice. Maybe it's real time and streaming through and not actually materializing. All of those are valid workflows for data. The Pentaho platform is not prejudice. It has to be able to fit any of those scenarios really, really well. Because different use cases demand a different workload. So one of the steps that we've added in version six is a transformation step that creates a virtual table. So we've blended data. We may want to take it as a virtual piece of information that doesn't have a schema that doesn't not yet materialize and look at it to see what it looks like. Maybe we want to pass data on to another process. Maybe we want to enhance it, for example, by running a predictive R algorithm and doing some sort of churn score or fraud score on that data as it transfers across. So pass that virtual table across, let it be enhanced and enriched. Then back it comes into Pentaho's orchestration and we further enrich it and ultimately publish it. So a virtual table can now be processed through as part of PDI steps. Second, we got to keep moving fast. So the system has to keep performing. Push down optimization, pushing the process in closest to the engines that can react really, really quickly. And loading data. Remember Pentaho can act in many, many functions. Loading or ingesting the data into the data lake, moving the data from the lake to the cube or into a mart serves a lot of purposes. So the word performance touches us on all angles. So, for example, the SAP bulk loader to take HANA and do a bulk load, optimizing the speed and performance. And data quality also matters. So as we collect and put more data into the data lake or into the data flow of that pipeline we just talked about, how do you trust it? Is it clean? Is the data lake getting murky and dirty? Can you still see through it? How do you trust it? As part of govern data delivery, obviously we need to put the right controls in place so the data is meaningful and the data is trustworthy. Our business development team, Eddie White and the team and product management put their heads together and we've got great partners in this area. Melissa Data being one of them. Melissa Data has worked very hard with us this year to upgrade the steps that they have for data quality and profiling and cleaning the data and those steps naturally appear within PDI. So the data quality initiative can take flight. All that beautiful data you're collecting can start to become more and more trustworthy every day. So special thanks to Melissa Data for helping us achieve that next wave. And we have an ecosystem of additional partners we continue to work with in the data quality space but absolutely Melissa Data stepped up this year and the team did wonderful things. The second when I joined Pentaho about a year and a half now there was a fundamental question raised to me. We had multiple modeling tools to configure the environment. Which modeling tool would we invest in? Because we had multiple. Would we make the easy tool have more features? Or would we take the tool that we had that was very powerful and just add interfaces on top of it and make it simpler? The answer actually was neither. You stand back and ask yourself, why am I modeling the data anyway? That's where auto modeling and inline modeling became the goal for us. PDI as the data's processing can get tagged. It can get smart. It could build the model for you. And if it doesn't get it perfectly right or you want a highly collaborative environment where your end users contribute to the modeling experience, then they should do it at their place of work. Right on the screen right while they're using the product. So those combination of an auto model and an inline model allow us to have that kind of collaboration where the end user can participate in modeling and more and more of it is automated. So as a user you may be sitting in Analyzer and say, well, I want to build a new calculation. I don't want to call IT to do that. I want to build it on the screen, write an Analyzer save it, have it persisted and the next time the models are run that's a shared calculation for all of us. Equally, as a Pentaho data integration person I'm building these wonderful jobs and transformations and I can tag things and I can put metadata about the data into the stream. So there's a relevance of which fields are dates and which fields are to be a part of a hierarchy and so every time the job runs the model can be built without human intervention. So we continue to chip away at that every time. IT is super important but the more we can automate the lower end parts then IT can focus on the higher value parts and data and the models get into the user's hands much, much quickly. More quickly. So auto modeling and inline modeling. Now the job is never done. There's always new features and new enhancements. We've done a great step here in version 6 we're going to continue to enhance that because there are new capabilities especially in the inline modeling that we continually want to add. Let's talk about that second agenda as part of version 6 which was controlling the data flow. The first one is when you're setting up a great environment it was too hard plain and simple command lines, scripts files to remember syntax to remember don't screw it up or the servers won't work try debugging it later and that was hard too. It was simply too hard to get the clusters up and running and configured with Pentaho. So in version 6 first of all there's named clusters so once you do configure and name a cluster you can rinse and repeat it on different steps as opposed to having to define it over and over and over again. Second, a graphical interface to configure it. So the experience is not a script and search for files and do different things. It's a nice graphical interface and a bit of built-in testing to say we're going to work and a little test buddy that says, yep, this looks like we're on the right path. So we want to get out of the gate on the right foot and if the cluster was hard to configure with Pentaho that was not a nice experience. That's cleaned up and simplified now in version 6. Security, the job is never done. We upgraded to the latest versions of Spring Tomcat, Java there's a lot of underpinning work to upgrade to those technologies but this opens up a lot of great doors for us to innovate on. So the first is with single sign-on and taking advantage of the security models that you deploy. There's a whole different type of range of those. So with Spring, the latest update we can support a much deeper and broader security infrastructure. We also exposed some of the APIs so that user management could be simplified and automated. And again the job is never done here. Continually embracing security, making it rock solid is not going to stop. That's just bread and butter in an enterprise. You must have it, you must do it. So a huge leap here in version 6. Now the system has been configured, the system has been secured, the system is now running. Monitoring was next. If we were 10 years ago Pentaho serving a different audience type we would have put monitoring in our box. We would have wrote a little interface and we would have watched the signals from the system on how it's performing but that wouldn't make sense in today's world where big data is in the enterprise. We need to monitor our environment but slip it into your environment. You already have monitoring tools. You are already monitoring other parts of the equation. So we have exposed the SNMP traps as a common standard so that we can plug right into your monitoring tools. So again a sign of fitting in well with the enterprise. Playing well with that larger ecosystem around us. Alright, systems running. Governing well. Performing well. Securing well. Good word but secured. Great. Now it's called life cycle management. I want to upgrade. After I just have it all beautifully running, Chris told me today version six is available and told me to go download it right away by nine o'clock. I want to upgrade. Now upgrade is both a software upgrade and a content. All the wonderful things you built the models, the reports, the analysis, the visualizations. So we have to be very conscious of both of those. In the past, upgrade was too hard. The point releases for all of you in the room that upgraded from one point release to the next, I hope you saw a different experience that upgrades were graceful upgrades were predictable upgrades were working. Those were the minor releases. The same holds true for this major release. Bob Kemper, if he's in the room. Bob? Somewhere? Yeah, way back there. Bob Kemper is our head of engineering. Bob was ruthless this year to not accept changes in the code that would break your experience. And that is super tough to do. Because on one hand we want to do something cool and creative, but on the other hand we don't want to break the wonderful things you've built. It takes a real fine line to walk in that balancing act. But Bob was absolutely wonderfully ruthless this year to enforce that. And we're all going to benefit from that. So thank you, Bob. So that meant we took a very careful approach on every feature. So talk about a Java 8 upgrade or a spring upgrade. What's the word compatible? What happens if I don't move to Java 8 and I'm still using 7? Questions like that have to be walked through and answered and maintained so that you have that pleasant experience. The other part was the content. Well, where are all those files I have to move? Those models and those reports and those schemas, where are they in the world of Pentaho? You go and you find them and I wanted a push button that said export my current Pentaho environment and when I go to the other place hit the other button and import it. Versus navigate folders and structures and try to find things and hope you don't forget something. So again, another great leap forward in terms of ensuring a better enterprise experience. Now the systems are running again. Users are looking at the data. I'm in PDI. There's this wonderful job of transformations and you scratch your head going where did that data come from? It was transformed. It was twisted. It was filtered. It was calculated. It was manipulated. What's the truth and the lineage of this piece of data that I'm looking at? Because PDI is so powerful that I can twist and turn and do a lot of blending. But at the end, how do I go back and see where did this come from? Data lineage. If I change something, what's the reciprocal effect of that? Super critical for a life cycle of an application. So we've introduced data lineage for Pentaho data integration this year in version six. Track where things came from as a job ran. What happened to the data? How did it get moved and transformed? Again, all to raise confidence about the delivery agenda. The right governance, the right kind of trust if you want to call it one of the Vs of big data, the veracity. How do I trust this? How do I know where it came from? That's what the data lineage feature is now unfolding in version six. Now, again, sitting well within the enterprise, Pentaho is not the only metadata in town. You've got metadata in other places of your environment. So first, we're exposing it as a REST API. So you can take any of the PDI lineage information and expose it into whatever interface you wanted. Graph ML, a chart of your choice, or maybe you're just passing the information along to something else in your environment. There's also a metadata bridge by a partner, Meta Integration. They're building bridges using this, you know, we're sharing the technology. A bridge that says they know how to read and ingest the lineage from Pentaho. They also ingest lineage from a whole bunch of other analytic tools. Whether it's Tableau or a big data provider like Cloud Era. So you have a chance now to bring together metadata from your larger environment. Pull that together and have a full perspective. So we had to expose our parts and let it play well within a larger bridge. So version 6 this is again highlights, there's lots of details through the day but if you were to summarize it you know where did our great team spend their time and really pull forward a great quality release. The data services faster, powerful blending. Second, the enterprise level and third of course the improved self-service, that auto model, that auto publishing removing the need to always call IT. But of course IT having all the right enterprise management capabilities. This is an enterprise grade release. It takes that vision we talked about last year of govern data delivery of blending of a tailored experience and said that was great. Remember last year's slides they were in the capable category. And I said I'm going to commit to moving it to the right of that chart which was the simpler and enterprise category. That's what we've done. I hope obviously you'll prove that to me when you download and start using and echo back your feedback but that's the commitment we've made to produce that enterprise class software for you. That's the good news. That means we get to breathe today for half an hour maybe and we start working on the next release. Engineering has today to enjoy the wonders of your presence but we are already on the roadmap. We are already working on the next set of releases and I want to at least give you a glimmer as to where we're going. If we go back to the original diagram in the blue area we're focusing on the things we do best. The combination of data integration and embedded analytics in the enterprise. Blending big data with traditional trying to ensure the self-service balance with IT and making sure on the glass that it's a tailored experience. That remains the strategy and a lot of you will say well I heard that. I heard that last year. That's the point. Strategies should not shift. The strategy is still the right strategy. It's making a difference when we talk to all of you and see the applications you're building. Strategy shouldn't wobble. But the execution continues to accelerate. So we're still focused on the same agendas. The commitments to you it's still needing to be simplified. Yes we can now name the cluster and there's a graphical interface and a test buddy but there's more and more to do in cloud list deployments. On the side you can see a scrolling list of some of the things. Load balancing, fail over get it running and then leave it and it just happens that it's auto working nicely. More push down, more performance job is never done. So our commitment is to continue that. The second commitment oh and by the way in that platform if I go back a slide yeah. Hadoop is no longer or big data is no longer a project in one department. This is now in the enterprise deployed as an enterprise asset. So think of upgrading for example. How frequent do you upgrade? In an enterprise you have to plan for it because it affects tens of applications not just your one project. So that's no different in the world of big data. So when a new version of Hadoop occurs it has to be planned for its deployment. The citizens that sit on top of Hadoop like Pentaho has to be planned for. So one of the things we also have to get better at in our deployment configuration is multi version. Because what happens when one department is on one version what happens when another department is on a second distribution how are we going to play in that multi environment? So that's an area part of the roadmap that we're working on as well. The second large area that govern data delivery that beautiful balance of IT control with that self-service and that trust. So more and more dynamic blending there's a real important part of Pentaho's capabilities that says look this isn't hard coded this can be highly parameterized it can look at data and dynamically adjust the transformations there's a real interesting dynamic aspect that nobody else does. We're going to keep pushing the dial on that one because that's a big differentiator and a huge value to clients. The second is the auto prompt of SDR so let me just describe that one quickly. So in streamlined data refinery remember we started it in 5.3 we then brought it to the cloud in 5.4 we then improved it in 6 with auto modeling and inline modeling to get that collaborative model activity. When you want to actually look at a streamlined data refinery some web page shows up and it has prompts that says well which years would you like or which kind of data would you like and you'll hear about FINRA shortly who uses SDR as a customer for us to help design it but somebody has to build a web form say here's I'm going to pass parameters into this SDR workflow or data pipe but if you stand back and that's good for certain types of applications but if you stand back and you go why is this thing not just prompting me automatically that's how my old reports used to work I used to have a report and it would prompt me for a date range or prompt me for a region and I didn't have to go and get somebody to code that it just happened how does that same phenomenon happen in SDR that says I'll just prompt you for the things that you need to answer and behind the scenes I'm going to go and do the hard stuff go get the data and blend it and then publish it and model it so that auto prompting will make it simpler so we'll have both the power of the custom front-end that's tailored so we'll have the power of simplicity at the same time so that's a big commitment we're working on on the platform more to go deployments now are getting even bigger and more distributed so docker, puppet, chef what are those best practices to take pentaho and deploy it in those models we're no longer in the era where you can install and we ship it on a CD the installs need to be scriptable customizable used in deployment tools that you're using both on premise and in the cloud so the job's not done the content life cycle really really happy that we did that push button of export and import the next step is to add granularity to that maybe I export everything but over on the import I only want to import certain models, certain reports for certain users give me that granularity of control the upgrade continuing to assure you that upgrades matter and that migrations shall not break that's been a journey we've been on already since 5.3 and 6 conforms to that so more and more of those administration and back office requirements to make it simpler so that every time we release it's easier for you to adopt it and then finally on the embedded analytics side the front end there's continued investments to make sure we are embeddable remember this is a fundamental approach of saying look instead of building a product on the front that can do 20 things and only 20 things and then you ask for enhancements we add 21 or 22 that's an out of the box BI tool what we are doing is not that we are creating those front end visualizations and capabilities so that you can take it and embed it you can take it and put it into the context of your applications put it into the look and feel of your portals tailor it so that as you hover a sparkline occurs or as you move here some activity occurs that's the simple experience for a user but it's very tailored to the use case and to your environment that's where we are focused so analytics has to turn into insight and hit the glass but it better be tailored for the exact experience you are looking for that's this focus so there's work we are doing there to continually make it easier continue to open up more apis take advantage of you know more capabilities whether it's geo mapping whether it's apis so that you can continue to add your own visualizations and test them and know that they are going to be a good citizen of the platform all those extensions and of course leveraging C tools this wonderful capability to tailor that experience we got to keep moving forward on doing a better job of documenting C tools making more of these capabilities simpler because they are really powerful but we know we need to simplify a few of these as we move forward I wish I could have done all of that in version 6 but like you we are constrained constrained by time, resource and energy so I hope you are pleased with first what we did deliver in version 6 and the previous 5.3 and 5.4 the enterprise grade hopefully the road maps reassures you that we are continuing that journey of enterprise grade and making the big data blend really big in a big enterprise deployment that's our focus okay you know the govern data delivery story you've seen me use these slides before balancing that any data the pipelines brought to the users tailored experience when I really stand back and say okay are there some other road map items of interest at a super stratospheric level this is what I would draw job 1 is managing and simplifying that data flow you can translate that into all the beautiful words I just went through that is job 1 where Pentaho wakes up every day and focuses however we have some interesting friends called Hitachi and Hitachi allows us to scale and Hitachi allows us to start reaching some pretty new places but the best part is without distracting the Pentaho agenda that road map that you saw delivered this year and the road map that's coming next year has Pentaho written all in all directions of it there's no oh those were good ideas but we put them to the side and now we're gonna do 50 things for Hitachi that's not what you're seeing this is Pentaho's road map about Pentaho but I do want to take advantage of the greatness of Hitachi the big data labs are working on very cool technology first heat shield things to work on enterprise class capabilities compute, storage so when I really look at some of the hot spots for me there's a couple that stand out first leveraging the compute platforms leveraging content searching the lake searching the lake to me is even a broader term of structured and unstructured and being able to tap all that data because we're getting pretty good at structured data we're getting pretty good at unstructured text data we're pretty good now at a lake and working with the homelessness of a lake but there's a ton of other data out there we still haven't tapped in the world of big data so search capabilities and how to leverage unstructured data whether it's video and sound or whether it's emails and whether it's documents how do you take that combine it and blend it with all the other data and I'm not just talking about an inner join or an outer join how do you connect these in context the metadata about the metadata how are things logically correlated and connected that's the hard part so we're focusing on some areas of research there predictive predictive orchestration we're not going to be an R for example or a Python or an MLib our job is to execute and orchestrate great technologies like that so when a data pipeline is going and having a transformation and I want to add a churn score or predict fraud or predict maintenance on a large piece of equipment how do we orchestrate to know when to run all those nice algorithms and what to do with them when they come back with responses both in real time and in batch that's a critical important part again Hitachi can help us get there so at a super high level the focus on managing simplifying the data flow our partnership with Hitachi to take us into some wonderful expansion areas that make it even better in the future devil in the details please go to the product roadmap sessions the product management team is here they've done a wonderful job thank them when you see them they worked like mad this year to produce these releases they also have details on the roadmap they can walk you through more specifics so there's some very specific sessions on the board I didn't really encourage you to take a look at grab us in the hallways of course we're here thank you very much and welcome back to Pentaho world for those that are returning and new folks welcome for the first time and we'll see you off the stage ladies and gentlemen please welcome Saman Michael Farr senior vice president technology at FINRA so I'm going to tell you a little bit about our story at FINRA it's the financial industry regulatory authority and what we do and how that relates to big data and analytics so what do we do we oversee we're a regulator as the name implied we oversee over 4000 brokerage firms approximately 640,000 or so registered securities representatives and every day we're monitoring financial markets looking for fraud manipulation, compliance all sorts of things that you don't want to be seeing happening out there and this is inherently a big data type of a problem which I'll talk about a little bit so to give you a sense of it in 2014 at FINRA we brought around 1400 disciplinary actions against regulated broker dealers 134 million dollars in fines and approximately 32 million dollars in restitution to the harmed investors so people get their money back that they lost they in terms of the context around this we're monitoring these markets and we're getting feeds from them it's up to 75 billion events per day so these are orders if you're familiar at all orders to buy routings of orders cancellations, trades, quotes these this is what this constitutes so the repository is over 5 petabytes of data and tell you a little bit about what's involved here so you can get a bit of an appreciation for it what happens is if somebody goes and buys a block of shares and it goes click the button online and it goes off to some sort of an exchange and that block of shares is sold or bought and that's basically how it works the way it works is that the block of shares goes and it's split up into pieces and it's routed around to various trading venues looking for the best execution price so this ensures that you get the best possible price so you might have a block of shares that's say a thousand shares and it gets routed around all over the place to various trading execution venues and parts of it start to get satisfied so what we do is we get the feeds from these places and then we reconstruct an order graph which represents all these various trades so you can see here so we get the feeds from the broker dealers we get feeds from the exchanges and feeds from what are called TRFs these are special types of trading venues that send us this stuff and what we do is we integrate this data set together and then we're looking through it so we've got scanning going on all sorts of patterns being matched against this and regulators looking at this to see what's going on and to make sure that really if there's fraud it's caught market manipulation it's caught are people in compliance with what they should be doing so that investors are getting the best fair shake possible so around two years ago we started an effort to really reinvent our platforms for doing this and we had a few objectives around this that might resonate with the audience here we wanted to develop tools remember this is all of that multi-pedabyte scale with this much data coming in we wanted to develop tools that allow exploratory analysis of the data because you don't necessarily know the question when you start but you might recognize the answer when you see it we wanted to be able to have self-service analytics for user-facing applications so not to have many projects that kick off when somebody wants to drill deep into a segment of data looking for something we wanted to have a modular approach with algorithms and libraries of these so that people that are not algorithm specialists but more understanding of the business and the data can use these things without being bogged down in the details of how they work and we wanted to be able to have it be extensible so we could integrate different types of data as it comes along so as our business involved as markets evolve to be able to do that I mean the benefits of this I think become clear but what we want to go through that is more confidence in what people are what the analysts are looking at more evidence so when we take people when we take things into investigation we've got better, more stronger evidence increased staff effectiveness and reducing the manual effort for complex analysis to be able to not have that underlying data manipulation that often happens in politics to take that off the table really and focus the job of the analyst on analysis so what we did around this in terms of the picture that we had for ourselves was we wanted people to really have our users to have a dialogue with the data and not have it be something that's over there somewhere and really it's something that they work with every day they come in and ask questions turn it around pivot it look at it do what's necessary and this was it's a feeling of being able to have direct access to that data and not have it be intermediated by a technology group in that normal day-to-day path of doing business so when we were thinking about this and the way we conceptualize this we divided the analytics really into a continuum and you know there's lots of ways of categorizing analytics this is one that works for us really at one end of the continuum we've got ad hoc analytics at the other end we call it templated analytics so the ad hoc at the top is discovery and research so the type of people that are here often data scientists, statisticians their motivation for looking at this stuff is to find interesting things some of what they find may later actually become analytics that are further down the continuum then a particular aspect of this is for the discovery and research it's not restricted in where it can go today the people could be looking at here tomorrow it could be there one line of inquiry could lead to a different line of inquiry it's ad hoc also in terms of the direction it takes the business oriented inquiry was similar in that sense but with a few differences one of the user is different they're not a data scientist necessarily they're a business person so what they have is a lot more visceral understanding of the business because that's what they do but without necessarily the data science skill so this has guardrails we have an applications that want to help guide people and keep them on the track but they can go in any direction we want to let them go in various directions to explore data and be able to look at it and get different perspectives around it then further down the continuum pattern recognition and in this context what we mean by pattern recognition is the concept that we have a library of patterns of what in our case is bad behavior there's various types of this different types of market manipulation compliance violations and so on and being able to apply these patterns to the data to find other examples of it that are going on out there then finally it's what we call structured data examination so these are that order graph I was telling you about give me all the order graphs for this particular trade give me this, give me that it's sort of a transaction oriented it's very structured and templated and this is a petabyte scale that we're working on these things so the solutions to these things or the approaches and to give you a sense of this and where we are at this point in time and the tool sets are revolving around this obviously is for the data scientists a lot of SQL we make extensive use of different SQL technologies a lot of Hive a lot of R we have a lot of private clusters that we kick up I'll talk about later on here that all of this is happening in the AWS cloud so this is for the data scientists the business oriented inquiry we make extensive use of Pentaho here people are regularly doing pivots and filters on a couple hundred million rows of data that's extracted into the data mart we've implemented we have Redshift is backing some of these and we have the automated provisioning of these clusters that happen on the pattern recognition side we have many different approaches here that are tuned to different classes of patterns for rule type of base patterns we use SQL a lot with Hive for unstructured data for classification, extraction, cluster analysis tribe analysis is to find clusters of information and then be able to look through those clusters and find commonalities in those clusters that wouldn't be evident otherwise and then on the structured data side we do a lot of HBase oriented data mart that are giving a second response to these queries to give you a sense of it in the previous environment that we had before we went along this it was our private data center environment in-house typical type of data center that I think you would see in many places we made extensive use of data processing appliances and so forth like things in this last category we would have had systems that for a certain type of complicated query the response times might have been between somewhere 20 minutes and several hours by going with these architectures that I'm going to talk about in a moment we brought that down to sub seconds to 90 seconds on these types of things by going with the approaches that we're going to talk about here so but to focus on the analytics part of it so how do we what are we thinking here one thing that we've seen a lot analytics oriented business is that one of the things that makes analytics very difficult unnecessarily difficult is all the other stuff that it drags with it so when you have an analytics project it's well pick a number 50, 80, 90% of the time is often spent getting the data so you can actually query the thing so getting the data finding the systems capacity to do the analysis making sure you got the right version and I think some nodding heads here it's normalizing it, integrating it making sure that the right people are looking at it all of the stuff that's in the blue part of the pyramid is not really related to analytics it's not that core essence of the analytics but it's that weight that drags the analytics projects makes them cumbersome so our approach was to deal with this stuff in the infrastructure in the data infrastructure to provide an environment then that allows us to rapidly develop analytics and be able to determine what works, doesn't work and iterate on this without having each project drag all these aspects with it so what we do is the intake, the provisioning as I mentioned we're using AWS cloud so provisioning a cluster automatically the data management the version control, the lineage the normalization associated with the data the integration is a big part of it to have all of this buffet there with the appropriate access controls so that the analytics can feed off of it so how do we do this and what was the approach to this so you can see on the left side here we have the data coming in we developed our system on AWS for several reasons among them high level of security the infinite amount of storage that's available and that was a big thing for several reasons, there's the obvious a lot of storage because you have a lot of data coming is important yeah but by having an infinite storage mechanism in S3 it allowed us an irregular one to have a system all the way through is to automate a lot of the tasks that have to happen and if you have a heterogeneous environment where data is in different pools here and there it's near impossible to automate it because everything is somewhere and it's always somewhere different so to have rule based access to this stuff is much more difficult or near impossible to implement that was an important part if you remember the let's see the provisioning to be able to achieve that then the we make extensive use of Hadoop and Spark on the validation steps and the data integration steps when the data comes in we try and do that the validation best practice is closest to the point of entry so the read and then we have a data manager we invested a lot in our data management services and now still we are open sourcing this and it's under the name of Herd as in herding data but the data management services were key here so we don't end up with that hodgepodge that can easily develop in a data intensive environment so this includes cluster management the provisioning of clusters the job management, the version management of the data notification of when data is available and what it is so all of that stuff that you don't really want the analytic systems to be dealing with and it's a different type of programmer it's a different type of an approach these things are encapsulated in the data management service and this runs on S3 then on the analytics side of it that stack I was mentioning before that runs off of this it makes extensive use of Pentaho on the data integration side in the ETL chain and also especially on the business oriented inquiry side with the analyzer product being able to have the users do those drill downs what this does is in this type of an environment the things you see on the right the way it works is that as data comes in clusters are kicking off we use EMR if people are familiar with that in AWS to kick off clusters that are feeding off of this stuff so we have pattern recognition clusters going off that kick out alerts these alerts come to the analyst the analyst looks at these alerts then we'll maybe do some interactive querying using the business oriented inquiry or using the structured data examination so I might go in and hammer it and see what's going on here or say look I really want to drill deep in category 2 so they'll make a request we have Hive going against the data management systems they'll get an extract Hive query to do an extract the data will be ETLed over into Redshift a cluster is made available for the analyst the analyst gets a notification and from that your cluster is now available they'll go into Analyzer they'll drill down, they'll slice and dice, do pivots see what's going on and look for relationships that are related to that alert that was kicked out by the pattern recognition does that make sense? so this type of model is what we found to be key for our uses and what it does is it takes the data that would come in a data intensive environment that would come in and make it available for curiosity that's motivated by something suspicious which is key you have subject matter experts in an organization these are people that really know the business and the data metaphorically can do a jigsaw puzzle backwards looking at the white side of the jigsaw they really understand this but you need access to the information to make decisions and to see trends and if you're going to do that and let people do that type of activity you can't have, we think technologists in the pipeline between them and the data it can't be a technology exercise it can't be a software development you can't have a software development life cycle between a business analyst and the data saying this is the data I want today can you give me that in a data mark can you have it for me this way if it's that way you're not going to get that level of interrogation of the data you're not going to get that intuitive use of the data which is necessary for people to have a visceral understanding of what's going on in markets and to see where trends are where's the manipulation the new types of fraud and manipulation coming up every day where's it going what are the trends, what's the behavior out there that's what it requires, is that hands on by people who are not technology experts so what it does for the technology people puts them in a different business instead of doing that production support and provisioning it's developing the software and fixing the bugs in it and coming up with new features and functions but not in that main line of interrogation of the data that's been a key aspect for us in this so if I have something of this to try and generalize for people I think that would be an aspect of it other aspects of generalization for this audience we looked long and hard remember that pyramid with the blue pieces that aren't related to directly related we think to analytics and that's the part we need to solve because otherwise we're dragging that with every single project and for that we saw cloud and AWS in particular as being the solution because we didn't want to be putting our investment into aspects of the stack that are really not core to our business and moreover it's the way we're using these facilities and features it's better getting it through the cloud that was an aspect to get that stuff out of the way and really treat that so when we get new projects coming for example that need new pools of data what we do is we look at this and say what is this data how does it get integrated into the whole where does it fit in with the other pools of data so that the right keys are there to make sure that you can navigate to it and join it with the other things that are available that sits off to the side by itself and that's again part of the cultivation of the center which allows then the analytics to feed off of it because those analytics are going to change as the business evolves and we can't see what they are necessarily day one some of them are fairly evident as it gets more sophisticated and those needs will change and as the business innovates we want this what about that we are going to be able to have that fast turnaround at the top layer and not drag that software development life cycle with it so a tool in our case it's the Pentaho Suite to allow us to do that helps at that level and there's that whole rest of that pyramid under it that's what I would sort of urge you to look at if this is in your core line of business that's simply at 10.15 thank you