 The world has changed. Visible and virtual worlds are blending. This is digital transformation. It creates a limited opportunity, disrupting business as usual. A transportation provider who owns no cars. A hotel chain who owns no hotels. Through this disruption, winners and losers emerge. And technology can be a catalyst for success or failure. Winners will use technology to drive outcomes and focus on what is at the core of digital transformation. Data. But the world of data is changing. Today, data is connected, open and fluid. To drive measurable outcomes, you need the power to capture data from the edge. Outcomes include reducing energy consumption and optimizing maintenance, saving one customer over $10 million every two years in fuel efficiency. Providing differentiated analytics capabilities to a leader in connected car technology. Estimating tens of millions in savings from unscheduled downtime while improving train service. The journey to digital transformation requires a solution that can manage data to end. A solution that takes full advantage of new and emerging data types and technologies without disturbing how you run your business day-to-day. A solution that allows you to move beyond big data to transformation. That solution is Pentaho. Please welcome to the stage Pentaho Senior Vice President, Customer Success, and Chief Solution Architect, Anthony DeChaser. Welcome to Pentaho World 2017. Are you excited to be here? Okay, you can talk back to me here. I'm a little interactive, but are you excited to be here? Come on, one more time. Are you excited to be here? Oh, welcome to all of you to Pentaho World 2017. I would like to thank our sponsors, MapR, Melissa, IT Novo, for helping us put on this wonderful event where we can collaborate and hear from you, our customers. I'd like to also thank all of you from traveling from wherever you came to wonderful Orlando. We have 29 different countries represented. We have 450 attendees. Most of you are technical, so we can geek out together. That was a joke, you could have laughed. So, we're excited to be here today. We thank each and every one of you for coming, taking time out of your day, time out of your week, to come share your experiences with us. We're excited to hear from you, to learn from you what's needed in this next wave of innovation. We're also here to celebrate 13 years. 13 years this month, Pentaho was founded. We started with this concept of having an embeddable, extendable data analytic platform. The founding principles were simply that the solution had to mold to the problem versus forcing you to mold to the solution. We needed to provide a way for you to get the value out of your data in the most impactful place possible. That's at the point of impact on how you're working in process. Throughout the years, we've gone through many transformations, we need changes with the technology, but that core principle has been our driving and centering force. Rumor has it, we all have urban legends, but there's an urban legend about Pentaho that our founding CEO had a wonderful conversation with some of our architects, sketched some things in the back of a napkin called Hadoop at the time, some four or five years ago. And within a couple of days, building upon that extendable platform, we had a working prototype of how Pentaho can integrate with Hadoop. With Pentaho 7.0, we built upon that with the adaptive execution layer, which then allows us to extend that concept to adapt or to be ready for whatever the next wave of computation is out there. That's that focus on providing the extendable solution that brings us here. Throughout the 13 years, I've been blessed to be here for 12, so I've seen a lot of things. I've seen a lot of people come and go. I've seen a lot of customers. The whole thing that's funny about Pentaho is I have the reputation of being the escalation guy. So I get to meet all of you, even at your point of pain, and through each of those interactions, we've learned that the problem is no longer data. Big data is no longer a big problem. This next problem is how do you transform your company? How do you become that internal disruptor that leaves the change so that your company can disrupt your entire industry? How do we help you transform? Transform the way you look at data, the way you think about data. Transform the way you do analytics so it's no longer just looking at dashboards, but we have analytics guided by artificial intelligence, machine learning, predictive models. How do we help you transform to being a data culture where everyone values the data you have and everyone has access to the right data? How do we help you transform to just processing data from traditional data sources to mastering even the unknown data sources at the edge of your computing capability? How do we do that for you? How do we help you do that? That's what today, tomorrow, is all about. It's to share the thoughts we have, the plans we have to help you lead that transformation. Are you excited? I'm gonna ask that again. Are you excited? Are you ready to take a leadership position in this transformation? Okay, I don't know. I think we may have to do that again. I'm gonna try that one more time. Are you ready to take a leadership position in this transformation? Are you ready to be disruptive? Great. So, it's with that same focus on doing something new and innovative that we've launched and introduced Hatachi Ventara. We took three entities, Hatachi Data Systems, Hatachi Insight Group, and Pintaho, and formed this new, massive innovation that will lead this next transformation. With that, let me introduce our President and Chief Operating Officer of Hatachi Ventara, Brian Householder. Good morning. We got their coffee. Watched the World Series last night, 11 innings. Pedro was out in the bar, so I know at least some customers were out in the bar late at night as well. So, as Anthony mentioned, at least my name is Brian Householder. I'm the President and COO of Hatachi Ventara. I have a few more background about Hatachi and Hatachi Ventara. So, a quick question for our customers of Pintaho or potential customers. How many of you are customers of Hatachi above and beyond what you do with Pintaho? Can I see a show of hands, please? I'm going to at least cover up a few. All right. And how many outside of what Anthony just talked about have heard of Hatachi Ventara before? This is, again, no employees and I see a few more. Okay, great. So, let me just at least give a quick update on Hatachi overall. And then I'm going to spend a little bit more time about Hatachi Ventara, our strategy, what we're doing. I had a chance to talk to a number of customers yesterday. Can a little talk about transformation a lot because we at Hatachi have done a huge amount of transformation. Pintaho and how that fits is really key to our transformation as well. So, Hatachi overall obviously a $100 billion organization we spend over $3 billion a year in R&D. We are an innovation and technology company at our core. So, to give you an example of that we have 120,000 patents just in terms of the technology innovation that Hatachi does. I didn't get the numbers right. It's 120,000. And so really at our core we are a technology and innovation company and that's really what we focus on. That's really what we are looking to go do in terms of how we can actually provide value for you. Now at Hatachi we have an overall strategy and overall vision called social innovation. So taking around that technology innovation that we do we talk about social innovation. And really how do we actually actually benefit both the businesses that we provide outcomes for but then also more importantly how do we also help benefit society as well. And so we talk about this in terms of you know everyone's obviously focused on helping you achieve your bottom line objectives. How do we actually grow revenues, how do we actually then you know decrease cost, optimize cost, drive the bottom line. And that's very important and the outcomes that we really are focused on are trying to help you as a customer do that. But then we have a second mission as well how do we actually then go out there and do things that help benefit society as well. And so we call that our double bottom line really how do we actually benefit both business and society. So you'll hear a lot about what we talk about in terms of better outcomes, better business and better society. And it's really fun and exciting for all of us including myself in terms of being part of a mission of Hitachi overall. So you'll see Hitachi Ventara, the Pentaho piece of that, how they all fit into helping Hitachi achieve its social innovation objectives. So let me just give you a quick background a little bit more of a historical lesson around the journey we've had with Pentaho and give you a little bit better handle in terms of how Pentaho fits into this new Hitachi Ventara as well. So I've been with Hitachi over 14 years. I was actually brought in. I'm more of a software person, done software services in a number of other areas. A few of us were brought in to help Hitachi transform and this was Hitachi data systems here 14 years ago. Mainly back then was an infrastructure company only about 20% of our revenue came from software and services and we were out there trying to help change that and so we really had a strategy around how do we actually move beyond the infrastructure into becoming more of a data company. And that was something that we really had for the last 10, 15 years been working on that overall strategy. We made a number of acquisitions we certainly changed a lot of our road maps a lot more software, R&D huge changes that we did we're now at the time back in 2015 about 50% of our revenue came from software and services. That was great. But really when we wanted to shift to become a data company we knew we had a big hole in our overall strategy and that's really where Pentaho fit into the overall mix. We had some amazing solutions that helped drive and solve content solutions some great software but one of the biggest problems that we did not solve at the time was how do we actually help people with their analytics with their big data challenges how do we actually help them in terms of a lot of the structured data how do you actually make sense of that and so really when we started talking more and more with customers we knew we needed to actually have that core foundational piece around data and analytics and so certainly I was involved in this acquisition back in 2015 it's been over two years now and one of the key things that we looked at we looked at all of the analytics companies in the marketplace so this gets into if you look at the desktop visualization solutions all the different solutions out there but really the biggest thing that we looked to go saw was really the number one challenge that I run into in terms of talking to customers how do they actually get to the data itself and one thing that Hitachi is very good at we're very good at large complex problems so if we start talking about the world that you have right now it's not just a world of terabytes or even the world of petabytes we have a number of customers that we're helping them manage exabytes and exabytes of data and that's the challenge that we saw and as we start talking about where the world is going to be here in the future getting access to that data being able to analyze the data from all of the different sources that you have was a massive challenge and Pentaho was really out there solving that challenge the other one that we really liked about the Pentaho model and this is one thing that you'll see about us and I'll talk about in terms of our overall core beliefs was really around we like the open source aspects of Pentaho we actually are a very open company very open culture and we really believe that's where customers wanted to go they certainly wanted to leverage the open source communities and that was a core part of that too so that happened in 2015 just last month as Anthony mentioned we actually have created Hitachi Vantara and Vantara is a name that talks about advantage and advantage points but really it's combining a few different organizations together but frankly more importantly how do we actually then combine a lot of the great Hitachi data innovations together and come together and I'll talk about what's included in Hitachi Vantara here in a minute and then really our journey overall is really moving forward how do we actually continue this digital transformation journey and then ultimately move into the world of IoT and it's interesting I had a chance to talk to a few customers yesterday and it's mixed in terms of how much you're doing around IoT but what we're looking to go do and what we'll talk about I think throughout the next couple of days is really how do we actually help you deliver what we call edge to outcomes regardless of where the data is created we want to help you capture it we want to help you analyze it we want to ultimately help you deliver the outcome that you need for your business and that's ultimately what the mission of Hitachi Vantara is all about so as Anthony mentioned we have three organizations together and they're included in Hitachi Vantara but more importantly it comes down to the capabilities that Hitachi is out here trying to provide to our marketplace to customers like yourselves and potential customers here in the room as well and this includes what we have around software this includes all of the experiences that we have around services and solutions this gets into a lot of capabilities that we have around IoT and OT and so a lot of the different things that Hitachi is bringing to bear in terms of operational technologies we're starting to provide to our customers as well I'm sure you've gotten into things around machine learning, artificial intelligence things along those lines as well those are all things that we are actually providing together and so if you hear about Lumata which is around our IoT solutions and software if you hear about Hitachi data systems or Pentaho that's all part of Hitachi Vantara and we're super excited because this is the data arm for Hitachi so if you start talking about better outcomes better business, better society this is really where it's at in terms of providing the digital innovations to help you really solve those problems that you're trying to solve for your company and ultimately for your customers as well so Anthony mentioned transformation and certainly we've gone through a massive amount of transformation here within Hitachi but as we're even learning from a number of customers yesterday each and every one of you are going through your sets of transformation as well Anthony mentioned it's very important you either disrupt or you be disrupted and we had Jeffrey Moore actually speak at our event here about a month ago another customer mentioned Jeffrey Moore yesterday it's very interesting really start talking about basically how disruptive this market is today and I have the opportunity to talk to customers all over the world regardless of sector, regardless of vertical regardless of industry all of these areas are getting disrupted and so what we look at and we'll talk about this in terms of our core beliefs within Hitachi Vantara but basically what we look at is the two most important assets in your organization certainly people is one but your data is number two and I think it's preaching to the choir in terms of the folks that are here in this room but what we're seeing here is the customers that can actually really understand what's going on in their environment get control over their data use that as a massive strategic advantage for their organization they have a leadership role you know Amazon's an easy example you get into all of these newer type business models so much better than anyone else in their marketplace and I think you are at the forefront of customers really trying to push that envelope and what will be interesting to see is actually how much influence you have within your own organizations to be able to expand that pie because I think right now you may be a lot more than what the industry norm is in terms of amount of data being analyzed but if you start looking at where customers need to go and where you need to go you're gonna need to have more and more data sources coming in that you can analyze understand start predicting and doing amazing things with it here so really those are the two key assets that we see when it comes to transformation now what's interesting and if you talk to McKinsey or any of the other ones that are out there right now most companies have not made this digital transformation journey so McKinsey will say less than 40% of companies have actually digitized massive amounts of studies out there that say less than 5% of your company's data is actually being analyzed now again for folks in the room here that may actually be a much higher number that's great but at least my kind of unofficial sampling of talking to a number of customers is probably not more than 50% in most cases for your organizations I was talking to a very progressive customer yesterday and it was 30 or 40% now not saying that you have to get to 100% but again if you get back down to people as well as the data that data is the most critical asset that you need to make sure you're mining and again I think the key one here is really how do you in your organizations make sure you have enough influence to be able to help the CIOs the CEOs the CFOs the entire part of the organization to transform because really if you start talking about the executives out there depending on your organization some of them are very familiar with all these technologies and all these solutions and probably a lot of them are not that familiar with it and so really how do you actually then go out there and start delivering the value for your executive team that really starts seeing how they can go out and transform because you will either disrupt yourself or I guarantee you will be disrupted and that's really really critical I think all of us in terms of how we actually want to go out there and transform and certainly the mission of Hitachi Ventara is to partner with you to really deliver those kinds of outcomes to your organization to really make sure you're the one disrupting as opposed to being disrupted so let me spend a few minutes around our core beliefs and again this gets down to kind of some key beliefs that we have that we believe is differentiated relative to others that are out there in the marketplace and so certainly data is critical and if you look at Pintaho and the Pintaho product set and the different things we're doing there that is front and center to what we believe in terms of keys to your transformation it's that and the talent and people that you actually have in your organization front and center and the other thing that is very very critical for us and I would really ask that you make sure you fully understand that's around our approach this is the Hitachi's approach is we want you to own your data we don't want to own your data we want you to own your data so if you think around things like walled gardens and all those other things that are out there well that's not what we're about it's a reason why we bought Pintaho that it leverages the open source communities we want you to leverage whatever communities that you want that help drive your business forward and that we can help add value for that and we think it's fundamentally different there's a number of companies out there that want to work with you that wants to own your data you know this is around the hotel California the clouds, the this, the that or what have you we actually want to make sure you have the keys to your kingdom that you can then decide to put your data wherever it is you know there's a lot of discussions yesterday around cloud and how it intersects with the big data in the analytics world and where it's going to go different changes are going to happen but I think I would really challenge you to make sure you work with companies that really have your best interest in mind as opposed to they really want to completely control and own your data whether they say they do or not it's really more about the actions that they actually do day in and day out we spend a lot of time internally talking about metadata and we actually joke internally metadata is the new data and so it really gets into if you want to start talking about how you analyze all of the data in your environment that's actually stuff that you're doing within your Hadoop and your lakes and your warehousing environment or that's all of the other unstructured data that has absolutely no structure to it whatsoever across the entire situation there you want to be able to search all of that metadata and then ultimately pull the actual data itself whenever it's relevant and so we see actually the situation where you're going to actually have a lot more analytics happening at the metadata layer that then it's ultimately going to start pulling the right records or the right objects if you will that are going to help you then understand what you want to understand in your environment and so it's really critical in terms of how these are going to change over time around metadata is the most important piece of information out there especially when you start moving from the world of terabytes to petabytes and then the world from petabytes to hundreds of petabytes or to exabytes of data and again there's numbers of customers that we work with that have exabytes of data and they're still growing at 30 or 40 or 50 percent per year this is around thousands and thousands of applications that are creating data how do they actually get control over that and we really believe the metadata is going to get separated you're going to have a whole different kind of analytics tier for the metadata that then can pull back the right data at the right time for your environment the other key belief that we have and I'm sure you see this in your environment as well is certainly the data is going to outlive the application that created it and the underlying infrastructure so if you look at the applications and I'm sure you have a lot of new applications that are happening in your environment mobile apps or what have you average use of the life of an application outside of the custom ones that your company built 20 some years ago that you're trying to band aid through and keep up but if you look at new applications the average use of life is 1 to 3 years infrastructure maybe you can stretch it to 5 to 7 years but most often you're going to want to keep that data forever or at least certainly most often longer than the 1 to 3 years or the 5 to 7 years and that's really really critical in terms of how we actually look at the environment but if you look at most of the architectures that are out there today they are still very much siloed in their environment so if you look at the application data is getting created in an application it usually gets captive in that application and then it strands that data on a particular infrastructure and the only thing that's actually outside of that is what you're doing within your warehousing environments great you're starting to pull some of that data out and to put that into your lakes as well but we believe that you need to still cover a vast majority of other parts of your environment that are still very much siloed out there and so if you have applications that are kind of outside their useful life but they still have that data that you need how do you start pulling all of those sources of data in there and again that's where Pentaho actually is really core to Hitachi Ventara's overall strategy and then if you had to actually summarize what we do so what Hitachi Ventara does is we enable our overall kind of data value proposition we enable you to have your data and insight when where and how you need it your data when where and how you need it very simply and that's what we do and certainly I know we have a number of customers either here in the room Pentaho has over 1,500 customers Hitachi Ventara has over 10,000 customers and so there's a number of great use cases I think that we work with or Caterpillar or CERN and again we want to appreciate all of the partnership that we do and one thing that's really near and dear and I think you've seen this with Pentaho, hopefully you will see this in the future with Hitachi as well we are very interested in becoming your partner we are not interested in being your vendor we really look for long-term partnerships that's really near and dear to our heart we are very much invested in helping you deliver the right solutions for your company for better outcomes, that better business and ultimately that double bottom line the better society as well it's really critical in terms of what we do Anthony also mentioned around artificial intelligence and machine learning as well and so what we're being able to do now with Hitachi Ventara is bring to bear all of the data and digital innovations that Hitachi has to offer this gets into the 300 plus data scientists that we have now with access to as opposed to what we had before in Pentaho this gets into the thousands and thousands of engineers in the innovation that we actually have and bring to bear and this is across all of the different verticals out there and so we're really excited about that certainly hopefully over the next couple days you'll learn a little bit more about what we're doing around AI and machine learning and all the aspects of that as well but it's really really critical in terms of we are continuing to push forward with these new innovations that ultimately help deliver the outcomes that you need for your overall business so this is Hitachi Ventara the key focus for our organization overall is really how do we actually help you deliver what we call these edge to outcomes and I mentioned it earlier but this is around wherever that data gets created or even if you need to create that data from the physical world and actually then translate it into a digital world into sensors into virtual sensors or what have you we want to partner with you to deliver these edge to outcomes that's what we do that is our core focus there's an organization again 60% of our revenue comes from software and services much more as a service type offering I know a number of you I've worked with Pentaho in the past it's kind of more of a subscription and as a service offering we are going to continue to do that and you'll see more and more service offerings there we believe we have a huge differentiation when it comes to not just pairing up IT but also bringing in a lot of the operational knowledge as well and there is no other company on the planet that has the IT and the OT capabilities that Hitachi has and we're going to bring those more and more to bear to add value for your overall business we have obviously over 10,000 customers like I mentioned before the big one is we do shine in the larger more complex environments that doesn't mean that your organization needs to be a 5 or 10 or 20 billion dollar organization it means to be the data and the data kind of requirements that you have for your business that's where we shine that's what we're really good at solving these very complex problems is really what's very very important for us that's really where we see our competitive advantage and really helping deliver a lot of value for your organizations as well the ecosystem is really important I know we have a number of partners here I certainly want to thank you for all of the partnership that we have we have over 2,000 partners and we'll continue to actually evolve as these markets mature certainly a lot of more partnerships in the open source community a number of other areas that we're working on that are really really critical for our success and then one thing that I'm very passionate about really is around leadership and culture and certainly we've made a number of changes on the Hitachi Ventura leadership team to make sure that we can continue to propel and drive our transformation as well and transformation is a never-ending thing and it's actually we talk a lot internally we actually get the right mindset to make sure that our organization can transform and then culture is the second key piece of that which is really really important for us as well and really very important is around how do we actually have the right culture that can sustain it's not about anyone individual it's about how does actually the team come together and collectively deliver these double bottom line solutions to our customer which are really critical we've been very fortunate we had a number of kind of culture awards over the years but that's really more of an outcome of the leadership and the culture that we've created which we're very thankful for so with that I want to thank you very much for just a quick opportunity to talk to you about Hitachi and Hitachi Ventura as well, thank you so next up I'd actually like to introduce Brian Hopkins he's actually from Forrester he's one of the first people at Forrester that started writing about big data and he's been working on big data and cloud and a number of other areas here and so he's going to actually talk about his views in terms of what's happening in the market today Music Hi, wow I love this venue this is a great venue isn't it so when Pintaho came and asked me to speak I had an internal debate with myself for a little bit about what to call this I mean should I talk about big data certainly I've written a lot about that and you've heard a lot about that or is it about advanced analytics should I call it AI I think you guys are pretty bombarded with that so what I decided to call it was or to make this talk about digital and I want to in about the next half an hour explain why I did that to you or why I'm doing that to you or talking to you about that and really leave you with three key pieces of information that I think that all of you need to thrive and survive in the digital age now our data as a research firm we talk to lots of customers all the time and we ask them one of the questions we ask them is are you involved in a digital transformation of some form or kind can who here is involved or helping their business or is their business doing something that has digital in the investment right probably yep about half of you right so that's what our data says over half of companies are involved in some kind of transformation to become more digital in fact a lot of the companies that we talk to that we work with like Home Depot Westpac Bank in Australia Unilever are publicly talking about spending billions that's with a B dollars to transform their business to become digital so it's high priority it really is a high priority thing that we see here's the problem when I talk to companies that are on this digital transformation right because how are they going about it we understand how to change right we plan build run it's linear it takes two three four years we've been changing and helping our business change for a very long time here's the problem I'll drop it in right these nasty little digital predators or nasty big digital predators seem to be changing the rules of business right out from under our feet so every time we help our business okay this is the strategy this is what we're going to do we need to go to the board we need to get a billion dollars we need to invest it we need to three or four years later the rules have totally changed and that's a problem in fact it's such a problem that often that happens and it becomes the only thing we really worry about is what happens when Amazon moves into my business right so let me state this problem another way we are thinking about applying big data analytics and IOT and all these things to help our companies become more digital in a linear way it takes time it's hard we have to invest we have to change the problem is is these digital disruptors aren't playing by linear rules right because they're using technology which is not changing linearly to keep up with customers who are really driving the show customers who are driving all this change and these digital disruptors are using and exploiting technology often better than other enterprises and therefore they're able to change much faster and keep up with changing customer expectations come to find out this isn't a new phenomenon right a fellow by the name of Ray Kurzweil he worked for Google for a while he's a PhD he's written a lot of stuff thought leader I think he founded a university he wrote a paper back in 2001 called the law of accelerating returns it's a real interesting paper it's online you can read it if you want it's a very detailed long kind of rambling and mathematical but at the end of the day when you wade through all the math he says three important things he says when systems like information technology systems evolve when one generation of technology builds on the next generation of technology the returns of that next generation of technology are more than they were the previous generation so every generation of technology accelerates over the generation above so you get accelerating returns more and more benefit as technology or as systems evolve and he says when this happens what you see is exponential changes in the fundamental measures of that system for technology it's cost and speed or power it goes down speed and power go up and they do so at consistent intervals so really what he said is Moore's law is not as this aberration this one time thing that we have to deal with it's not an exception it's the rule so when you look out really at the environment and everyone that's played off of Moore's law we see a whole lot of Moore's law like things going on so I'll name you can look these up if you want to right so you're playing out to be true so what we have to learn to do is realize that we're not living in a linear world right technology is hyper accelerating the pace of business let me make this a little bit more practical if we think in the terms of big data right so I published this in a report called move your big data to the cloud and one of the things that I argue in that report is that for instance take a look at Hadoop you look at Hadoop it took about 10 years in fact I have a t-shirt that I got on Hadoop's 10th birthday so I was wearing it yesterday as a matter of fact Hadoop took about 10 years so you could kind of sit back and watch it come and go okay I got time to deal with this now Spark went in about 3 years from a Berkeley amp lab science project to replacing Hadoop for a lot of use cases so it happened a little faster so it's easy to say well maybe I didn't call the linear pace fast enough maybe that line is I'm trying to project the heads a little bit faster I hope it's still linear and then along come things like artificial intelligence deep learning frameworks like TensorFlow and those go from a conception into really disruptive forces like almost overnight and so when you think about what's going on you step back and you go hmm I'm not in a linear world I'm not in a linear system and early generations of big data technology are accelerating future direction exactly what curse wheel said so what that really means is this as we invest in data and analytics and underlying infrastructure and technology to do those things a lot of we have a lot of IT professionals we have a lot of priorities we want to be efficient we want to be secure we want to be compliant those are all really important things but recognize this in an exponentially changing system I think this number is right 90% of the benefit happens in the last doubling cycle 90% of the change and when you go back to curse wheel's paper the whole point of his paper was our brains can't get around that because we're marketing along linearly so as things change exponentially what that really means is that the really important objectives for investing in data and analytics and all the infrastructure is flexibility speed and agility and as we march on and move faster and faster those are going to be increasingly important to the point where they may eclipse everything else I think that they will it's going to take a little time so when we think about how this applies to digital business what we really mean is the pursuit of customers and profitability and our pursuit of being digital to keep up with those customers and to be more profitable is requiring us to think architecturally differently about how we invest in technology and to think differently about how we put our businesses together truly digital is changing the rules of business changing the rules so when we talk to companies that are on this transformation journey to become a more digital business what we typically find is three sets of technology infused projects or transformations going on there's some IoT stuff going on connected products OT folks operating systems with device data to become efficient IT and OT connected products we see a lot of mobile or digital work or customer experience or digital customer experience work or sales and we see a lot of enterprise work ERP supply chain employee experience all these things and they're all good things but they're all kind of here, there and everywhere and as companies these companies that are investing billions to transform and to really become digital what they recognize they need to do is they need to bring all those things together into four key competencies they need integrated cross life cycle, cross channel customer experiences they need digital operational excellence so their processes are efficient they need to be able to innovate digitally with digital technology to keep up with those disruptors and the last thing they need to do is understand that they're part of a digital ecosystem they're not standing alone they're not competing alone, they build partnerships exchanging products and services and data and insight so when you think about this future of being digital the connective tissue between today and tomorrow and the people who are going to make that happen are sitting in this room right here because it's the data and the analytics that all these things have in common but how do you do data and analytics to enable you on this digital journey with Forrester we have a label for that we call it a system of insight and what a system of insight is is it's not a BI reporting someone's got to look at some data and make a decision rather it's an application that sources data performs various analytics to find actionable knowledge or insight implements that insight and software to drive action measures the outcomes which is in turn is data which is then fed back into the pool of data so you can iterate around and around that closed loop that works really well for customer digital experience because customers are changing so fast you have to be able to understand what they're doing and keep up with them it also works pretty well in IoT when real time data and real time decision making specifically when you get outside the data center into the edge is becoming increasingly important so if building these systems of insight is the way that you're going to help your company achieve its ambitions for digital if you're with me so far and you need to do that to keep up with the pace of business innovation technology driven change there are really three things I want you to remember right three pieces of actually there's four I'll leave one for the variant so there's three things I want you to take away from here they are you need to change your brain thinking from needing a data lake or expanding your data lake to having a data fabric or a big data fabric and I'll explain what that is you need to optimize your architecture to take advantage of innovations in the public cloud as a real competitive advantage and the last thing is you need ways to orchestrate on multiple levels to support the scale that you need to operate while still remaining flexible so three things so let's drill into those a little bit and then I'll offer some closing advice so I want to talk a little bit about the difference between a data lake and a data fabric and this is the real story I was talking with a chief customer officer two weeks ago a major bank in Asia Pacific and I was playing this conversation back to him which is based on dozens of conversations I've had with IT professionals just like you and I was saying what I see happening is companies that have gotten ahead of the power curve and built a data lake and brought in Hadoop and so on and so forth are now coming back to me and saying I built it and the business wasn't all that crazy about it and I say well why did you build it and the general answer goes something like this my business kept coming to me and saying where's my data and I said well it's in the warehouse well that's not fresh or fast or how much will it cost to put new data in there oh it'll take six weeks, three weeks ten weeks whatever it is a couple hundred thousand and we can retest and load in the warehouse that's too slow so we being on top of IT and new kinds of innovations looked at Hadoop and said oh wait a minute what if we just load it all in the lake and where's my data, it's in the lake just go get it the problem is what this chief customer officer told me is his business was still building the data lake, it was over budget now they had to upgrade it and by the way the data was really stinky and he didn't like it so he was coming to me saying how do I get IT on board with what I need right and that's the problem with the data lake architecture so don't get me wrong data lakes have their place not the end all be all that we thought they were going to be so that brings me to a data fabric so what is a data fabric a data fabric can include a data lake in fact it includes several data lakes large manufacturers I'm working with right now tell me look I don't have one I have about four or five built all over the place over time so there are really two differences to data lakes number one they increasingly separate storage and compute right because as you are on your digital journey and you're capturing data you're always going to be scaling out the amount of data that you capture and depending on the format and the speed of the data there are different places you want to put and keep that data to make it available and to both efficiently store it so you need a way to scale the data out always analytic compute scales in and out up and down right so the problem with data lakes specifically ones that are based on Hadoop is they couple data and compute you just bring the compute to the data and that's not really what's needed so data fabrics increasingly separate storage and compute the next thing they do is they feature ways to orchestrate the movement and availability of data at different levels of quality in an automated way so you can have the right data where you need it whether it's redshift whether it is Hadoop or Impala whether it's a data warehouse or a database right you need that flexibility so they give you the ability to orchestrate these and automate the scale that you need right so those are the two differences let's talk about the cloud piece for a moment we published a study last year in 2016 estimating the market for big data cloud for big data solutions and different kinds of solutions I did not work on this study it was done by a different group that does these marketing things but when they brought it to me and put it in front of me and said to Brian here's what we found and I looked at this data and I said holy moly that matches exactly what I'm seeing good job and what I'm seeing is this massive shift and I started in 2016 from companies who have got ahead of the complex open source on-premise big data thing and now realize man I gotta upgrade Spark I gotta Hadoop 3 is coming I'm doing all this stuff I'm under delivering on the value I need a better way to do it which is why we think in 2018 50% of firms will publicly declare a public cloud first strategy they'll make every effort to do their big data analytics in the public cloud specifically because they've run into this on-premise wall and it's not meeting the business needs that is being reflected by the market data that we're collecting and seeing but the real nail in the coffin of on-premise big data is the pace of innovation so when I talk and look and I just published a wave on major cloud vendors insight and analytics capabilities and one of the things that I found is that these Google, Amazon Microsoft a lot of the smaller players as well that are operating as pure play public cloud vendors are able to throw innovations into their platforms and make those innovations available to customers with a credit card and a URL turned on so you can go and incorporate new things like serverless SQL for Amazon Athena or deep big data deep learning analytics from Google TensorFlow or this thing called quantum computing which is coming in a few years, those things quantum computing units will be available in the cloud so if your architecture isn't optimized for the public cloud companies that I talk to who are optimizing their architecture for the public cloud are telling me that's our competitive weapon we know how to do this better than our so we can move faster right so you have to understand how these cloud innovations are going to allow you to move faster and to some extent you've got to place your bets on which set of vendors and which innovations are going to be the right ones for you the last piece I want to talk to you about is the orchestration piece and at issue here is as you look to bring IOT out of the silos that it's in into part of your digital strategy where your connected products or your connected systems are providing information in a secure way to your marketing and sales organization and your customer experience organization really that's the end all be all is to connect it all together right it's really about what do I do with the edge which is really kind of anything that's not in your data center and cloud and so in some research we looked at the different kinds of things that are popping up in the edge including gateways and edge devices and we looked at the characteristics of those things in terms of power consumption memory compute ability to do analytics and they're very different pieces parts how are we going to cope with this it's more data it's data that has to be managed it's hyper local data management and how do we push analytics out of the data center to some of these edge things because we're going to be our data says that you're going to be increasingly bandwidth constraint well there's really three things we see happening there number one companies that are doing this and doing this well are all using a micro services based application architecture because they need that containerized way to be flexible in the applications that they produce right they need behind those micro services packaged applications a set of data orchestration and federation tools that allow them to source those applications with in the databases in those applications with the right data in the right time flexibility kind of sounds like a data fabric and the last thing they need underneath all that is they need a containerized infrastructure orchestration management layer right to support pushing containers with data and analytics from the cloud in the data center out in the edge and back and we're seeing vendors increasingly support that kind of use case so they're all doing these three things so I said from a data analytics perspective you really need two things right first off you need a way to orchestrate what's going on with the data in your organization at an enterprise level at scale in the cloud and then you need a way to separate the delivery of data to your business in a way that lets you say yes I can answer that need for data and views of data within a day not weeks the second thing you need is you need ways to manage in a growing number of containers at scale and deploy those containers into an increasing variety of different hardware platforms that are popping up on the edge using things like kubernetes and rocket docker those are the two things you need right so let me kind of close our conversation here with an example so logitech is a big manufacturer and I had the privilege of talking with these guys a year ago and they kind of resurface now as pentaho users and so the first thing when I was talking with logitech about how they met their needs the first thing I noticed is number one they did it in the cloud 100% in the cloud I said bravo I was talking with another manufacturer I won't hide the name to protect the guilty the same thing a couple years ago they went 100% in the cloud but this other manufacturer went 100% in the cloud and the first thing they did is they hired a hadoop distributor and installed hadoop in amazon and built in my opinion a fairly rigid infrastructure in the cloud because it was all based on writing spark jobs to transform data from the cloud object store into some sql hadoop way of querying data and so as new data sources came along they had to rewrite the spark jobs oh by the way the data scientists wanted a different version of spark but they were working on one cluster so they had to plan to upgrade spark to support the data scientists and rewrite their jobs and around and around that circle they went very in the cloud but still fairly rigid not cloud optimized logitech didn't go that way instead what logitech did is said well we might need hadoop someday they're open to it building a data lake but initially no hadoop what they did instead is they have a data orchestration layer that they built on pentaho that works in a data publisher data subscriber kind of way kind of similar to the way Kafka is doing but not real time more in batch because larger data sets and what this infrastructure does is it breaks up the loading of data and the consumption of data and the producer and consumer templates and because they have templates they can outsource a lot of this work to their partners and get it done very efficiently and it also allows them to continue producing and consuming data which gets stored in Amazon S3 if one side breaks so if the consumer side for redshift breaks they can keep loading or if the loader breaks they can keep consuming because they've buffered this in a pub sub model right? One of the things that I thought was interesting about this case study was they said once we did this we recognized that we could automate because we were working with templates with an orchestration solution with tools in the cloud 90% of what our DBAs were doing so when you think about a data fabric that's the essence of a data fabric because what they did is they did this orchestration taking advantage of cloud where it made sense and then they layered a data virtualization tool on top of their data orchestration layer to separate out provisioning of data to tools like Pentaho Business Analytics other tools like Tableau or Click many different parts of their very large distributed business had different needs for data and analytics and they separated that out with a data virtualization layer made a lot of sense so now they're able to be agile to meet their business demand for analytics and different kinds of data and be both scalable and agile in the way that they orchestrate the movements of data and take advantage of things like cloud object stores and Hadoop eventually I think they'll probably end up implementing some part of Hadoop when it makes sense to do so so they really have embodied a lot of these characteristics and what that does for them is that positions them very well to now increasingly pursue their IOT agenda pretty good example so let me close out summarize some of the things I've told you and I'll give you that bonus tidbit that we talk about today first thing we talked about here is as you pursue the internet of things and seek to add the internet of things investments into that into your enterprise as you bring those together it ceases to become about IOT it becomes about digital remember today's speed is tough but it's the acceleration that will kill you if you can't keep up so not only do you have to go fast but you have to be getting faster all the time which means that as you consider what to invest in and I talk to clients like you all the time what should we do for our next generation data lake what should we do as we migrate in the cloud it's more like here's our strategy Brian what do you think about this strategy my advice to them is always very similar the most important thing that you can invest in is the ability to be agile, flexible and fast in the future so these three to five year plans if you have a three year plan I say crumple it up and throw it away the only thing I can tell you about your three year plan in three years you won't be doing what you think you're going to be doing it'll be different so you have to invest in being able to shift and adapt so therefore the three things that we talked about that you have to master to survive and thrive in this digitally accelerating world is you don't need data lakes you need several and you need to stitch things together in a fabric architecture you need to optimize yourself for taking increasing advantage of the public cloud and you need to focus on many levels of orchestration so you can keep up the scale and maintain the flexibility and here's the last thing I didn't tell you I haven't really talked about this yet I could go I could do a whole other 30 minute speech on this I don't have time and that is this I've done this and I've done it well and I always ask in interviews the very last question if you could go back and talk to yourself of two years ago what would you tell them and very consistently I get this answer we wish we'd started the data work earlier talking about the metadata the governance it's hard nobody wants to invest in it and every company that I talked to that's walked this path says we eventually got to the point where we knew we had to do it so therefore companies when you can muster the will to bake these investments into your digital transformation plans and do them in a way that makes sense and also gets that hard work done you'll be far ahead of your competition and I wish you the best of luck and I hope that's you so thank you very much and I'll be around this afternoon take care thank you so much Brian wasn't that exciting I'm gonna ask that again wasn't that exciting geez you guys making me work hard I thought this was vacation the key things that I heard in the back that I really want to just focus on real quick is just that we have to be agile in the transformation you have to invest in adapting to as the world change you need to change even though you have plans it's that innovation that you have to be careful to cultivate is with that I would like to introduce someone who's well-renowned world-renowned rather in connective cars she's known as one of the leading women in IOT she has over 12 years experience with data intelligence, ambient intelligence all sorts of ways of getting new life out of data help me welcome the director of innovation of IMS Ella Halal hi everybody so who can tell me what this car is Aston Martin yep it came in a movie which movie was that James Bond yeah it was the 2015 James Bond movie it's pretty huh but not all cars started that pretty in the 1718 and 19th centuries there was a big fight between the different fuel sources to which one will drive the cars steam engine and combustion engines as we know them today we're actually competing early on in 1885 Carl's Bend came with the first practical car as we know it today Henry Ford after was able to commercialize it and make a big deployment of the Model T then came the seat belts the car engines the car keys the airbags, the APS and a lot of the innovation that we enjoy today in the cars so today's automobile is dominating is dominating machines that had undergone 100 years of evolution so today's cars are faster, more agile and more fuel efficient than their older counterparts over 1.2 billion cars are on the road today worldwide and 18 million were sold in 2016 alone so cars had a profound impact on the way we live our lives how many of you uses a car on daily basis? more hands please yep almost everybody so if you think about it an average person roughly makes 10 trips a day between dropping your kids to school, going to work picking up your kids, maybe picking groceries or meeting some friends you do 10 trips a day you spend 4.5 years in your vehicle that's a long time actually an interesting statistics Americans were stuck in traffic 8 billion hours collectively 8 billion hours in 2015 alone 8 billion hours that's time that we can all sit and binge Netflix won't we all love that? so that came to something we need to think of driving as a utility the way we live our lives today depends essentially on driving our modern society it's the heartbeat of the modern society we commute to work we commute to visit our friends family we even drive to for enjoyment sometimes so we need to think of driving as a utility but if we're spending all of that time to drive if we're spending all of that time in the car we should enjoy it and to be able to do that we need to think of connected car and when I say connected car I don't mean connecting it to the internet because how does that really help me? I mean connecting it to me if that means connecting it to the internet too sure but when I say connecting it to me I mean making the car give me actionable insights help me achieve my goals making the drive enjoyable experience for me making it convenient for me and there's a lot of drive for this definition of connectivity number one people like me are demanding convenience so it's customer demand we're all living this connected lifestyle with our wearables with our mobile phones as an extension to our arms so we need connectivity in the vehicle too we're living in a fast changing environment where distance is we need to come smaller and shorter you live in I know friends that would live in San Jose and actually work in San Diego which is crazy from my end to commute that distance every day but they do and then there's billions of dollar investments as well as a need for autonomous vehicles all of that is driving the definition of the connected car and the need for the connected car so how can we enable the connected car well we're lucky enough to live in an era where vehicles have become sensor rich you have about 100 million lines of code in a vehicle today it has a processing power about 20 personal computers and roughly it processes about 25 gigabytes per hour that's a lot of data and this is all on the cars today but that's not only it we are also living in an ambiguous moving era where we have so many sensors all around us including on us sometimes and all of this can collect data that will help define mobility how we drive and where we're going so using all of these diverse sources we can collect information about mobility and our mission at IMS is to enable the connected car and we did that by building the IMS processing connected car platform it's a platform featured a unified data engine that enables mobility services and the core of it is data so my day to day job is to make sense of about 10 trillion data points sometimes overwhelming but enjoyable I can guarantee you so think about it this way so what is data data is a summary of thousands of stories but they cannot speak them they rely on us to tell the stories to give them a clear voice of what they're saying so we started crafting the lifestyle style narrative trying to understand the driver behavior is he normally aggressive or today is he just having a bad day what's the habits when do you usually leave home and how can we help you have a better commute and customer preferences whatever that means so we try to understand all of this and to be able to understand that we work on all types of analytics starting from data cleansing which from my point of view is actually the hardest part because we collect data from heterogeneous platforms that are noisy with different sensors with different quality and of course how many here works with data on data databases more hands so you guys know the pains of noisy data and then we work with descriptive analytics so descriptive analytics is understanding what is happening then there's diagnostic analytics which is why this is happening so if I identify that you're swerving are you swerving to avoid something like a cat or a dog or a kid running after his ball or are you swerving because you're distracted or inattentive there's a big difference between these two then we try to look into predictive analytics try to understand what you're going to do next and like our holy grail is prescriptive analytics can we impact your opinion can we help you change your mind to do something that is less riskier can we assist you so there's a big part of all these types of analytics and these are examples of what can be done with it from something as simple as risk proxies that understanding where is your garage location and your mileage estimate of how much driving you do here is your driver intent understanding your motivation behind certain actions and harder is understanding your state of mind even I sometimes don't understand my state of mind so I'll give you some examples of what can be done so this is a trip plotted and as you can see here from the data it's very clear the difference between your idling driving in city and driving on a highway and better the difference between harsh acceleration which is a behavior in rest on a highway it's quite interesting huh but not only that data is more powerful aggregated so for this slide I looked at 17 million driving trips from three countries US, Canada and Europe Germany and aggregated all of them and the interesting part was that regardless where you are your average trip is roughly 15 minutes roughly 15 kilometers and you're driving on 30 km per hour it's quite interesting that regardless where you are it mainly converges to these specific numbers even if you're a longer commute you usually stop in the middle which was quite interesting but also if you look here on that side you can see that usually most driving happens in the daytime and even if you drive at night you tend to overspeed and tend to drive longer distances so that was quite interesting to see that these habits between these different countries are actually converging but not only that so we all know that distracted driving is a big big issue and the statistics say that about 25% of all crashes in the US are based on because of texting and driving so again we looked into the data data can tell lots of stories which confirmed so we looked at 53% of trips have some sort of distraction 39% have distractions while you're stationary like in a traffic light or stop sign so you pick the phone look at the text put it down and drive 36 of them had some sort of distraction while you're in the move which is bad and the average distracted driving duration is 94 seconds which is about a minute and a half so imagine you have your eyes off the road for a minute and a half it's quite interesting to see that these stats not only that we are able actually also to look into crashes so data can tell us exactly what happened in a crash can tell us the story so if you look here this is somebody who's reversing hit a post and bounce back isn't it powerful what data can tell us so to be able to enable all of these awesome insights this is our system we have our system and in the heart of it within our business center as well as our hands data feeds we have pentaho integrated to help enable reports fast reports and fast creation of reports for our clients but although we had great success so far this is not it this is not the end of it because we are we as humans our preferences are changing who here has a kid 16 years or older I don't but you can raise your hand 16 years so quite a number okay so let me tell you a story I have these co-ops coming for an internship they are roughly 22 years of age third and fourth year university they're coming in they're very excited happy to help they came in and the first thing I said awesome I need you guys to do some test drives for me sure okay do you have a car no okay no worries I have a car for testing you can take it you have a driver license and I hear silence like what you don't have driver licenses no and it was mind-boggling for me because I remember as soon as I turned 16 the first thing I got my dad to do is to drive me to get my driver license they don't even have a G1 which was mind-boggling so we are living in an era where Gen Z's and the younger generations are changing their mentality they are multimodal transporters they want to use an Uber a Zipcar a bus and they can use all of them in the same day so there is a great paradigm shift from vehicle ownership to vehicle usership instead of owning a car that use it only for 5% of the time and your car is idle for 90% of the time and roughly used by one or two people to a car that is used 100% of the time to its fullest capacity so there is a big paradigm shift and this paradigm shift is actually enabling the need for autonomous vehicles so autonomous vehicle will enhance the productivity as well as the driver convenience and experience so how many of you know the 5 levels of autonomous vehicles a few hands so there are 5 levels of autonomous vehicles as defined so level 0 is your 1970s dad's car does nothing but drive good you enjoy it but you as a human driver are the one in full control level 1 would have one thing that is autonomous like cruise control or automatic braking one feature that is enabled level 2 would have 2 or more features that can happen simultaneously level 3 can have a little bit of autonomy but human needs to be fully alert and ready to intervene at any moment of time level 4 partially autonomous that means it can be autonomous under very specific scenarios so for example autonomous on highways only and then you have level 5 that is fully autonomous you sitting watching your Netflix playing or pokemon go or whatever your heart desire you don't need to do anything even taking a nap so out of all of this where do you think we are today where do you think the Tesla's and the Google cars are 5? Nope so they are actually between level 2 and level 3 somewhere in between you need a driver that is fully alert Google car will only function under very very specific scenarios when the mapping is very accurate and detailed Tesla's need to have a driver that is fully alert and ready to intervene at any time so we are far away from 5 but we have made great progress so over the past few years we actually made great strides toward autonomous vehicle into vehicle automation however we still have a long way to go and think about it the hardest part of level 5 is that you would have level 0 to 4 coexisting on the same road so your level 5 needs to be able to interact successfully with all others on the road but I think the biggest challenge is not the technology I think the biggest challenge is human perception so the biggest challenge for autonomous car and its adaption is how can an autonomous vehicle provide an experience that meets the human desires that are satisfactory so let's think about it quickly how many of you have a friend have a partner family member that this person thinks that they are the best drivers ever and you think they are the craziest driver ever yep raise hands everybody so driving is subjective is very very subjective behavior and it's not easy that you sit there comfortably trusting your life and more importantly the lives of your kids and your loved ones with no control just a quick exercise thought think about planes how many people are afraid of planes versus how many people are afraid of cars and if you look at the mortality rates in cars it's way higher than planes sense of control usually helps and when you give up control when you are steering wheel or the right to intervene with level 5 autonomy the human perception and human trust is the biggest challenge so for us to be able to incorporate human trust and human desire we need to analyze and understand human state of mind and human intent and to be able to do that we would need to do data analytics so I started my talk with the different types of fuel that power the vehicles as we know it today and I will end it by saying that the vehicles of the future would be powered by data analytics big data analytics and I would leave you with this information is the oil of the 21st century and analytics is combustion engine so thank you thank you so much love that analogy love that statement information is the oil of the next century and analytics is the combustion engine that's going to power everything love this story with IMS because it shows how you can take the data from the devices and actually produce an outcome safer cars lower insurance rates just as long as they don't start sending me emails and tickets because I was driving too fast I think I'll be okay someone got the joke so let's keep to you moving on get the energy going again are you guys excited today try that one more time are you excited today alright with that energy let me introduce to you our chief product officer here at Pintahol Donna Perlick good morning good morning everybody now we're going to get the energy going you guys many of you who know me we're going to get fired up now right we're all fired up it's Pintahol world so I gotta tell you I really missed having this this last year for those of you just give a quick clap applause if you were here in 2015 woo yeah so it's great to have everybody back and the question is what's the word of the day anybody know yes that's a good one transformation yes we have heard a lot about transformation and when we were thinking about the theme for Pintahol world we said gee in 2015 we were putting our data to work got a lot of good outcomes you know the customers in this room I know you guys have been measuring outcomes had a lot of success and trials and tribulations along the way but a lot of success so we said gee what's coming next well we're kind of moving past that things are disrupting we're going to transformation now so it was a great theme we heard this morning from Brian talking about Hitachi Vantara it's all about transformation how the world is changing data is changing our businesses are changing business models are blowing up all over the place and so I think a little bit differently and then Brian talked about speed agility flexibility all the things that we're going to need to be able to survive and Ella man what a great what a great story right that was amazing I love listening to Ella as well as just as many of you maybe know I love that story of IMS it's such a great story of kind of the power of machine generated data and analytics and so we moved through that and it's like well gee let's think about that we talked a lot about transformation but what are some other shapes that transformation can take well what if you thought about somebody handed you ten billion dollars and said can you figure out how to get service internet service to people that don't have it or areas that don't have it that'd be pretty transformative right and then you might think well let's think about something completely differently let's let's go to a totally different industry about manufacturing we all know that manufacturing floors have become much more automated over the last several years so what about if you could start to not just think about using a factory floor and automating it to prevent defects but what if you could deliver a better quality product that'd be pretty cool and as we learned you know what's really going to drive a lot of our experiences with different businesses is how do they make us feel do we feel like the product's good is it have good quality so along with preventing defects and saying okay we prevented something from breaking we also want that product to be great that'd be pretty transformative and then if you just go to the basics of life every day we have states and governments that are trying to manage money and do things that are transformative and they have data all over in silos and you know I have to admit that I do remember mainframes I can say cobalt with a straight face which probably not for me if you can but maybe you can't there's data siloed all over the place mainframes are still out there right so data is not in one simple place it's going to be easy we're just going to throw it all into a big data lake and everything's going to be great these are transformation these are transformative outcomes now the cool thing is these are Pintaho customers so when we think about this big booming word out there of transformation this is something that we've already been doing with you you've been on this journey with you this is not something that's new transforming so USA see they are basically the designated organization for that the FCC has said look you are going to be given 10 billion dollars to bring service to areas that don't have it so that kids in school can do their homework so that if we have IOT applications there's some service somewhere that we can actually connect and capture data from we've got Haesung who's probably out here I think somewhere Tire manufacturer that does all the nylon cords that are in tires imagine if you could start to instrument and capture data from your factory floor to start to improve the quality what that would do for your business and then the state of Louisiana five million people and they're saving about tens of millions of dollars every year now because they figured out how to take a completely outdated siloed infrastructure of data and turn it into a modern data infrastructure where now they can get the funds to different constituencies in their state much faster and also save money really super transformative outcomes these are Pentaho excellence nominees this year too so let's give them a big hand for what they're doing and so as I said we've been on this journey we've been on this journey with you guys and we've had a single kind of consistent vision for the product whether it was 13 years ago as Anthony Scheser said when you know you had a bunch of great guys sitting around figuring out what we do with open source technology it was still about data and analytics it wasn't just about analytics and it wasn't just about data integration they understood that the complexity of solving data challenges has to go with the analytics and so the vision has been that from data engineering to data prep to analytics that has to be thought about as a single flow if it isn't something breaks along the way and either your business outcome isn't going to happen because you're not looking at the data that's going to actually solve for the analytic insight it's really important the other piece that's hard is if you think about that whole pipeline of data you also have to think about all the administration how do you keep it secure how do you do things like monitoring what about multi-tendency and so those are the really hard challenges that sit kind of across that pipeline and so when we look at the roadmap at Pentaho we basically think about three key areas where we need to invest the first is in a visual data experience and that is either whether you've got data that is sitting you're going to be able to deliver data to a dashboard or some type of analytic view in an application or what Pentaho's been working on since release 7.0 what about if we brought that analytic experience into the data preparation and we could actually let you prepare your data and visualize it as you go imagine the time that that saves from IT to the business and we hear that over and over and over and talking with customers I hear it very often it just takes too long to get the data to the business and so if we can work to shorten that cycle that starts to become transformative. The second area is in big data processing and I have to admit 2010 I learned about Hadoop and I'm such a kind of nerd in that way I was like this is so cool because I had known BI and then I was like wow and now we're going to have all this data so we had to pay attention to Hadoop and then MapReduce and then Spark and now whatever's coming next lots of things on the horizon and so those are the areas around big data processing that we have to always be on top of so that when you're ready for those technologies we're ready and then lastly that Enterprise Platform as I mentioned that's the stuff that's really hard you know those are the things that aren't always the most flashy on the outside but if you're not thinking about how are we going to scale how are we going to manage the data across this enterprise it's hard and so those are the things that we want to make sure we're investing in so that even if we get from the data to the insight if we can't tell you what happened from here to there or we're not being able to administer that then we're not really doing our job what's the great news so we're Hitachi Vantara now right which we're all kind of getting used to saying that like Vantara, Vantara but so if I think about from a product perspective what's exciting to me is we now are part of this bigger portfolio of products and it starts with storage which we all know Hitachi has storage but the cool thing is as Brian said there's metadata and Hitachi brings content platforms and object stores and the ability to index and search on data and things that add to that analytic environment that we can help bring into your organization so that's an area where you'll see a lot of improvements in the platform relative to storage and content. The second piece is around services so when I mentioned that data pipeline all of the things you have to do to manage that data pipeline whether it's monitoring security with Hitachi Vantara they're building out a platform of services to be able to do that all the things that Pentaho provides plus others and so we can be able to leverage some of those services as part of our Pentaho platform which will make our product more enterprise ready and I'll talk a little bit about that in a minute when we get to the fun stuff but edge processing to asset management those are things around IOT that even if you're not ready today you will have to deal with edge analytics, data at the edge you're going to have to manage assets or things devices, your phone, whatever it is and so we'll be prepared because that's an area where Hitachi Vantara is investing and then of course Pentaho, you know Brian mentioned that we're really kind of the cornerstone of this whole data driven organization and so the things that we've done so well for so many years, data integration analytics that's still going to continue to be what Pentaho, the Pentaho platform has to do well it's going to take a much bigger broader shape over the next few years but that's an area where we bring so much value because even if you can get that data at the edge or you can store it or you've got all these other capabilities if you can't do the complex data integration and you can't deliver the insights well, there's not a whole lot of value and then applications, you can imagine and many of you are customers already doing this you're building applications on top of Pentaho so we're going to have the ability to do more of that so really exciting stuff from a product perspective and you can go wow, our whole candy store just got a lot bigger which is great so now the fun stuff so who's fired up for something fun 8.0 did you guys see it alright so Pentaho 8.0 exciting a major release but it's our first release as Hitachi Vantara which is kind of cool because it's the same product and the same platform it's a brand on the product but we're now a new company so exciting stuff and when we looked at this release it's a great release by the way you can see it in the showcase there's presentations so there's Meet the Experts I encourage you to go and talk with all the folks myself, there'll be road map sessions and find out more but we looked at three challenges one, data volumes and velocity so we all know about the whole volumes of data blah blah blah we've all seen the curve for the last five years and this is the velocity that it's moving faster and so in this release we said we need to make sure that we're broadening the connectivity to streaming data sources so that includes spark streaming the ability to connect to a Kafka stream as well as Knox Gateway to be able to connect to a Knox Gateway so those are the three things in that area and I'm going to go real high level but you'll be able to drill into those the second one is all around managing resources processing and storage resources and we're all constrained by resources our cells included time capacity I know I don't have enough time in my day I don't know about you but so what we said is gee we did a great job in 7.0 with the adaptive execution layer and allowing customers to scale up where they needed to and execute either in the spark on the spark engine via a Pentaho transformation and so what we did in this release is we made that easier to use we put some new capabilities in and we basically said we got to have it on cloud error and we got to have important works and more to come the second piece which is really cool is scaling out so this concept of worker nodes where I just want to add some more nodes I need more capacity can I add more and so how we did that was we collaborated with the folks inside of Hitachi and that services platform I mentioned there's something called Foundry we took advantage of that so we said gee we could do this on our own but actually these guys have done some really cool work and it'll make our work better and it'll make it more appropriate for the enterprises we go into and so that was a great first step in terms of where we're collaborating on the platform as Hitachi Ventara and then lastly the hard problem we're always trying to solve how do we get less time how do we have less time spent on preparing data and so investing in that visual data experience to bridge that gap between IT and the business and the time that it takes to deliver data but also to just remove some of the time it takes every day to prepare data you have a lot of resources some of them are expensive data scientists don't come cheap you want them doing the data science and preparing data so we'll continue to invest there so super excited lots of fun stuff going on transformation Pintaho 8.0 as we go into this with you as I said you saw the customers we have they're just a small sampling of all the cool outcomes that we see in your businesses so as we move from data to better outcomes we're going to keep moving with you on that journey and Hitachi Ventara with the resources and who we are now as a company we'll be able to do a lot more of that so thank you and now we get to move to something even more fun so technology all the technologies that are driving a lot of this transformation there's really three and there's something we call internally the power of three IOT, big data predictive analytics they all have kind of converged and they all kind of have created the ability to do a lot of the transformative things that you're seeing like what IMS is doing and so we decided to do is hey let's have a cool discussion let's bring out three experts who know a lot about these topics one on each topic and let's have a little conversation so you can help me welcome them to the stage first as a senior analyst from Wikibon James Cabilius welcome James have a seat second we're going to bring back Ella Halal who is the director of innovation at IMS so Ella welcome back and lastly especially the Pentaho folks you got to really give it up here come on Pentaho senior data mining consultant otherwise known as the machine learning guru of all time at Pentaho Mark Hall probably couldn't have embarrassed Mark for all right so let's have a seat guys we had a fun time yesterday talking didn't we we had to rehearse this and they were basically like get off the stage because you're talking too long we said all our good stuff yesterday you're just getting our remainder you might have to lower your bar it was really fun yesterday so one of the things that you know in this power of three is bringing these three technology areas together I mean when you see that what do you think has caused the convergence of those three areas of technology what do you think some of the causes of that those being able to be available to us today I think mobile devices are very important smartphones the fact that everybody's got them now and smartphones are essentially the the beach head for the internet of things in our lives and smartphones are both a massive generator of data and increasingly a massive consumer thereof and really much of what we do with smartphones involves things that were increasingly on predictive analytics to be in line to like e-commerce and so forth to drive you know targeted offers and next best actions and so forth so I really think it's the mobility revolution that's driving the convergence of these three technologies yep I second that I think also the big advances that you can see in the IoT space the ubiquitous sensors that are everywhere the availability of huge data set as well as the advancement in processing power enables us to do a lot more than what we were what was able to be done years back yes absolutely so you know as Ella was saying the vast amounts of data being generated has really enabled certain machine learning and predictive techniques to really shine today I mean some of them have been around for many years but the paucity of data the unavailability of data the inability to store data at a granularity that could leverage those algorithms is now being realized today yeah I was thinking about that because you know as I was mentioning Hadoop from 2010 I guess that's really as the big file system right or the big file cabinet as we sometimes used to call it that really allowed us to just accumulate those massive amounts of data right so I'm guessing that changed things yeah well it's not just Hadoop but of course the entire NoSQL universe of distributed file stores key value stores graph database the proliferation of optimized data storage and processing platforms for the disparate range of data types including machine data that of course is the foundation for the internet of things so yeah so the whole notion of a data lake clearly is like we're sitting here in central Florida which is Lakeland you're flying in the disparate lakes of different shapes and sizes and so forth so you have to think of the whole big data universe as like Lakeland it's like an archipelago of lakes but I also want to tie it back to the advancement in the whether it's the machine learning or the processing power that will enable it are key because without them we can't mine this huge data and the ability of data itself is not enough but the techniques and the ability to mine it understanding and pull useful inside of it is is a key massively parallel processing power in the cloud is critically important for this revolution as we know right GPU processing dedicated AI chips have really enabled things like deep learning to come to the forefront today and at Wikibon we're seeing the whole deep learning chipset space is just like booming right now in terms of VCs in terms of chip manufacturers you've got Nvidia and you've got Intel and you've got Google with your TPUs this is like an amazingly fertile space that we're watching day by day because the architecture systems on a chip involving more densely packed you know tensor core processing units and so forth next year and beyond we're just going to see this explode where every device of any sort will have embedded chip sets that are optimized for deep learning and the algorithms are going to live on the edge on the edge devices to enable if not completely autonomous operations semi-autonomous or you know cooperative sort of intelligence in the cloud in like Brian Hopkins is great presentation in the gateway and on the edge devices to play together as part of a unified intelligence fabric and the chip sets are critically important that's a good segue that's a good segue thinking about using the word unified and this great flow and everything but that's all sounds great but what do you see as kind of the complexities or some of the barriers to actually sort of the next set of outcomes we might see based on these technology areas so there are barriers around talent obviously all three of these technologies require certain skill sets and there are advanced skill sets so there is a shortage of people who have skills in all three areas I would say in order to leverage all three of these technologies there are also sort of barriers around the organization as well so the real need for a data culture within organizations to allow data to be shared between different departments to really be able to put data at the forefront so that people have access to it when they need it and so forth I mean these sort of things are challenges for organizations to address I second you and I would like to add to that the problem that I'm seeing from my point of view is that most of the techniques available today are designed and made for supervised learning and when you come to work with real world data it's messy, it's noisy and definitely it's unsupervised so this puts a lot of challenge and this is can tie back right away to the lack of the skills because this is not a straightforward application it's now you need somebody that has an understanding of the noise models, an understanding of the complexities as well as somebody who can work with these unsupervised techniques to pull useful insights out of them so this is another thing that I've seen in the market as a bigger challenge Yeah related to that, related back to supervised learning as the core of what machine learning and deep learning and AI are all about in terms of practical applications one of the barriers is simply for many developers is where do you acquire the training data sets that you need how do you label them, how can you find resources, human beings and otherwise to label the training data that you might need to be able to build and optimize and refine your machine learning models so how do you build an end to end pipeline really a DevOps pipeline that can acquire the right data that you need can do the labeling can assist you with building and iterating and scoring your machine learning models pushing them out evaluating them, this entire release pipeline much of it depends on the training data and more of that is going to be auto generated going forward but you know what the question that I have is does it have to be labeled data, can we work on our machine learning techniques to transcend from a supervised learning to a semi-supervised or unsupervised learning because when we talk about the data sets that we're talking about with these sizes with like hundreds of terabytes even beta bytes now we talk about it's very hard to label even if you have one million Amazon people working clicking and labeling it's still very hard to do so I think we need to push the boundaries and I think this is one of the challenges is that how can we push the boundaries beyond the labeled supervised approaches to be able to pull useful insights from an unlabeled data that is real world noisy data and this is how our brains work so our brains take this messy data and it's able to process and understand so now if we're pushing the boundaries of AI of machine learning and of data analytics I think the next challenge would be can we get these systems to work with these with similar data? The solutions are several like you've indicated more of a focus increasingly on unsupervised learning but also semi-supervised and also reinforcement learning so more and more of what we see in terms of robotics and self-driving vehicles and consumer appliances is robots of various sorts where there's no ground truth or there's no training data or more to the point you're not minimizing a loss function you're maximizing a reward function and build bots and build algorithms that enable the edge device to learn for itself what path through some complex environment maximizes whatever reward function it might be relevant to its domain I think that's absolutely I think we'll see more advances in streaming mining algorithms basically I mean that's a field that's been around for a little while now but there is a lot of potential there so especially when you've got your edge devices with limited resources limited RAM limited storage right you want to be able to filter some data at that point so that you can get salient information being transmitted back to your central servers or whatever for more heavier processing but the streaming algorithms have to operate in these limited resource environments so advances in unsupervised techniques semi-supervised techniques that can operate in an any time fashion on the fly without requiring storage I think is a very interesting area I totally second that but that also brings another challenge is that the availability of bandwidth for the streaming data because from an IOT perspective you can get the sensors to collect data at 100 hertz or even higher but then the cost of streaming it back for analysis is quite high so I second what Mark said the combination between like analysis and filtering on the edge device or on the node itself as well as getting it back can help with the cost a little bit over the costs are still very high for it to be practical I agree so I think with the bandwidth constraints out to the edge will cause more developers of AI to use federated training where the edge devices are not sending back or streaming back every bit of data that they gather to the cloud for processing or incorporation into a data lake but rather we'll send back summaries of the data that they're seeing filtering summaries back to more of a central node that does the ongoing training and validation whatever iteration through the models so we're seeing in the industry various efforts towards bringing more of a federated approach into distributed training we see a little bit of Google doing that I think some IBM is doing some of that and so forth I think that will expand I think that's federated training is not really the way most supervised learning practices operate now I think it has to become a more substantial focus of IOT cognitive IOT algorithmic training going forward but I think also with the advances of 5G as well as the other advances in telecommunication will help a little bit offset that issue too so it would be a combination between all of these that will help a lot of people to understand what's going on what's going on what's going on well it certainly sounds like we've got a lot of challenges ahead a lot of opportunities there always is right they always go together so one of the things that might be interesting for the folks is to understand from your perspectives if you think about 2 to 3 to 5 years from now I got this yesterday and got into some of the great outcomes at what price but I'm just curious you guys could share some of what you see ahead in terms of those outcomes something as mundane as the death of passwords through Face ID and similar initiatives Apple's already got it in this next generation smartphone and that obviously depends on face recognition which depends on machine learning and deep learning and Google's bringing that into Android and it's one of those mundane things in terms of multi-factor authentication available to us as we use mobile devices in every sphere of our lives both consumer, personal and business and so forth I think multi-factor and biometric authentication of that nature of a very strong multi-factor is coming to every application very very quickly I have disruptions I think of the greatest disruptions are those that simplify our lives and take things out that are unnecessary like passwords are increasingly unnecessary I know I would do a lot less I lost my password please send me it again that was the case it would give me a lot of time back I'll get the new Apple products all of them at once any thoughts Mark I fully anticipate being made completely redundant in my household by digital assistants and personal assistants smart devices the advent of the smart lawnmower and really won't be any reason for me to exist I suspect I have to find something else to your time Mark I think the biggest advances would be in the area of enabling convenience for the end user this is the big demand now this is what we're living and breathing with the computing around us would help enable this convenience we need stuff faster we need to be more productive and we're actually more selective nowadays on where we spend our time so the technologies that would be advancing and the ones that would be adopted are the ones that will help enable this convenience whether it's from a password that will disappear or from a lawnmower that will cut the lawn and hopefully something that will do the dishes and feed me exactly it kind of ties to that customer experience a lot of it is around obviously started with the big data and our ability to capture lots of things about what we do in our lives whether it's where we are, where we shop where we go this is also what's driving the autonomous vehicle the biggest advantage of autonomous vehicle is not just you sitting not driving it's you gaining that time back the 4.5 years back to your life whether you're doing some work or spending some time with your family or even using your vehicle for productivity like we all heard about the fridge that will see what's missing in the fridge and they'll send your car to pick it up and you don't have to do this trip anymore and I know that that might be for fetching the direction where most companies are working on enabling convenience better user experience at that one of the things that I started with a mundane example of a disruption which is the death of passwords but something that really excites me as a human being is the growth of generative applications that depend on machine learning and deep learning and they're coming fast into every sphere of our lives natural language generation NLG is one of the beach heads for deep learning in the entire web the whole web content management space more and more articles like news articles are being written by bots essentially and that are trained to be about as good a writer as a good human is we're going to see more NLG incorporated into every application of every sort that people use to generate content in all contexts that's a generative application depends on deep learning DL but also things like the next generation music composition workbench is DL enabled I mean we're seeing more music that's being composed by DL essentially I think over you know whether it's good music or not is a secondary issue or whatever depending on your point of view but it's like it's the new it's the new MOOC synthesizer for the 21st century essentially there'll be new expressive possibilities there'll be more visual art is being created by deep learning and machine learning using style transfer techniques and so forth those are coming into the mainstream of creator or maker culture in every sphere and we're going to see a fair amount of generative applications in things like in cinema and so forth take off in a major way but that's all machine deep learning big data under the covers so I think the other thing too would be moving away from narrow AI which ties back to what you're saying so it's creativity in AI but it's not only creativity in AI it's moving away from narrow AI AI that we have today is very very good in doing one specific thing the thing that you train your model for but if we move away hopefully in the future into wider AI it's going to be similar to human intelligence you can be intelligent in multiple things at the same time whether it's nature language processing, vision you're good in multiple things and hopefully in the future the AI we are able to train we don't have even to train hopefully our AI the models that would be built or self train even will be able to work on multiple things and that in fact is a hot research focus in the AI community called transfer learning transferring knowledge from one test to the next in fact the open AI consortium has essentially a test bed and a network for developers to come and improve their techniques and approaches for transfer learning but the problem with transfer learning today is that if it's a vision system trained it will be applied in different things but it needs to be a vision system cannot train a vision system to put it in language the definition of wide AI is that it can do vision whatever you're training it for and then it can do language music and it can do multiple things the applications are following money I think a lot of what I see in transfer learning is helping if you've aced one online multi-user game that you can transfer your knowledge of how to ace that game to other online games and so forth I mean I'm seeing a fair bit of that still within the same domain but I think wide AI as you're talking about is a bit further than two or three years out so to that and we were talking about yesterday all these great outcomes and fascinating things around music and art and that and then we talked a little bit about the dark side you know that there could be some potentially dark sides to the well we need a little dramatic effect a little bit of a dark side to I did a robot overlord costume to wear for Halloween this year everybody got a good one where are you going to be on Halloween probably at my home handing out candy I won't be there I don't want to see the robot costume it kind of scares me a little bit but in any event we've got the dark side perfect robot costume what are some of the things we talked about the other day about who's responsible for the models remember we were talking about that last night who's actually responsible is it the data who does it end with so Mark I don't know if you have any thoughts on that we were talking about the topic you have models you have data you have the data scientists that produce the models from the data and then you have something go wrong down the track well who do you point the finger at so I mean that's a tricky question to answer so yeah we talked yesterday about it and we were talking about the EU declaration about responsible innovation and the fact that there is a global awareness of the concept of responsible innovation which is you're trying to incorporate in whatever systems that you're developing the concepts, the morals and the ethics of us humans because we are living in our modern society and we're all governed regardless of laws by some morals and ethics of how human interacts with each other and now when you bring in this hopefully in far future this wide AI that can read understand and make decisions on our behalf whether it's autonomous vehicles or others how will it function how will it operate and when it gets in front of a moral dilemma how will it take the decision and I think you were talking about this example of car crashing somebody yeah that's the typical you know scarer example that is if an autonomous vehicle can't help but hit either one group of humans or another which decision does it make or should it make any decision that has to be every possible outcome on some way has to be coded into the full framework for development of that model and who makes that decision I mean how to code it and who takes responsibility for the inevitable outcome or for whatever outcome but but all these systems have the problem of the many hands so it's not one person who's developing this code to be responsible not that person who would explicitly say if you see two people here and one people here hit the one person versus the two it's not gonna it's never gonna be coded like that and this is another problem in responsible innovation is that how can we as developers of the technology as people working within technology embed our moral ethics and moral responsibility into these systems because it's not gonna be what if condition because you can never code every combination out there but you can code a foundational code of ethics on how it should behave and yeah so as you know as the speaker said this morning your data lives forever and it will outlive the systems upon which it was created likewise your algorithms will live essentially forever they'll outlive the actual developers who put them together and train them train them initially and they'll be retrained and so forth you know the organization that builds and you know and deploys that all of that into various applications accepts responsibility on the level of they own the algorithms or they own the data and so forth so I mean I am not a lawyer but that's an assumption that's an assumption that one organization is developing but now you have the problem of open source so this morning we were talking about open source now who takes the responsibility but I think at least that's my personal recommendation if we think about responsible innovation if we think about the code of ethics that governs us as human beings and this is the core of every like super wide AI system would be with the same rules then it might help us with a better chance of having systems that are morally sound if we can't you know completely indemnify ourselves against artificial stupidity what we do need however is tools to lay bare the algorithmic workings of whatever it is that did something horrible so that when it ultimately gets litigated that at least there's full discovery of the the exact chain of things that happened to put a particular algorithm in place at a particular time with particular data coming in from particular sensors to drive a particular so in other words if there's a judge or a jury in the universe that really wants to slog through all that crap they can do so because we give them the tools to easily roll up a narrative though it might be the most complex narrative on earth it certainly sounds like we have no shortage of challenges it'll be a narrative that makes as much sense as James Joyce's Finnegan's Wake does I'm sure the car will get you to sign away before you step inside yeah that's right sign away alright well this has been a fascinating discussion and I know that all of you have sessions coming up so James you have one right after this I run over to the cube I'll be co-hosting the cube today and tomorrow then I have a session tomorrow morning I believe it's 10 30 on AI and pushing AI to the edge alright so people want to learn more I think you have one yeah I have one today at 5 30 okay great customer showcase and Mark I know you have some yeah I have a talk this afternoon at half past two on spark ML Lib in WECA great I know your sessions are always fascinating for our Pentaho world crowds here and you guys always we obviously love all the customer presentations I've known you guys for quite a while it's great to see how you've evolved the Tachi Ventura great thanks for having us so thanks so much for you know spending the time and talking about obviously interesting and we got into a very controversial topic at a certain point so thanks so much we'll alright alright our crew is going to head off here great I'll just have two did you enjoy that panel I like the creative tension that was caused there artificial stupidity it's the first time I've heard that term I think I'm going to go by that domain artificial stupidity dot com so alright guys thank you for this morning we've gone through a lot you've heard from a lot of great speakers you've heard what the purpose is for this particular session what's next how do you take what you heard today and bring it down so you can execute on it well we have breakout sessions this is where you can learn from some of the best thinkers we have both within Pentaho within Tachi Ventura and from our partners they can show you or talk to you about the right strategies the right techniques the best practices for accomplishing your outcomes for achieving your outcomes we have the solution expo wonderful opportunity just to go and network with other customers other partners even with Pentaho employees to learn what we're doing form those right connections that can always be helpful for you in the long run and of course we have meet the experts I was backstage in the green room just basically chilling you know just relaxing because I had it all to myself and there are people still scheduling meet the expert sessions we still have time we still have slots if you have any issue that you want to deal with we can absolutely help you out there I personally am signing up for to help customers on some of those so if you have an issue or have a one-on-one conversation that you desire we have space in meet the experts but the most important thing no one responded you guys hung over from last night is that what it is? never mind the most important thing for me is to have fun we're here to celebrate not just to learn but also to celebrate and we have a brocade tonight great craft beers games music I already called dibs on Pac-Man so someone you're gonna have to beat me to get me off Pac-Man but let's have fun tonight let's network let's get to know each other so that we can partner for the transformation you need in your company are you ready to transform? yes ask this again are you ready to transform? yes then let's go do it thank you