 Live from Munich, Germany, it's theCUBE. Covering IBM Fast Track Your Data, brought to you by IBM. Welcome to Munich, everybody. This is a special presentation of theCUBE, Fast Track Your Data, brought to you by IBM. My name is Dave Vellante. And I'm here with my co-host, Jim Kabilis. Jim, good to see you, really good to see you in Munich. I'm glad I made it. Thanks for being here. So last year, Jim and I hosted a panel at New York City on theCUBE. And it was quite an experience. We had, I think it was nine or 10 data scientists. And we felt like that was a lot of people to organize and talk about data science. Well, today we're going to do a repeat of that with a little bit of twist on topics. And we've got five data scientists. We're here live in Munich, and we're going to kick off the Fast Track Your Data event with this data science panel. So I'm going to now introduce some of the panelists or all of the panelists and we'll get into the discussion. So I'm going to start with Lillian Pearson. Lillian, thanks very much for being on the panel. You are a data science. You focus on training executives, students, and you're really a coach, but with a lot of data science expertise based in Thailand. So welcome. Thank you, thank you so much for having me. Oh, you're very welcome. And so, and I want to start with sort of when you focus on training people, data science, where do you start? Well, it depends on the course that I'm teaching, but I try and start at the beginning. So for my big data course, I actually start back at the fundamental concepts and definitions that they would even need to understand in order to understand the basics of what big data is, data engineering. So terms like data governance, go into the vocabulary that makes up the very introduction of the course so that later on the students can really grasp the concepts I present to them. And I'm teaching a deep learning course as well. So in that case, I start at a lot more advanced concepts. So it just really depends on the level of the course. Great. And we're going to come back to this topic of women in tech, but we looked at some cube data the other day. About 17% of the technology industry comprises women. And so we're a little bit over that on our data science panel, we're about 20% today. So we'll come back to that topic, but I don't know if there's anything you would add. Yeah, I'm really passionate about women in tech and women who code in particular. And I'm connected with a lot of female programmers through Instagram and we're supporting each other. So I'd love to take any questions you have on what we're doing in that space, at least as far as what's happening across the Instagram platform. Great, we'll circle back to that. All right, let me introduce Chris Penn. Chris Boston-based, all right, as am I. Chris is a marketing expert, really trying to help people understand how to get, turn data into value from a marketing perspective. It's a very important topic, not only because we get people to buy stuff, but also understanding some of the risks associated with things like GDPR, which is coming up. But so Chris, tell us a little bit about your background and your practice. So I actually started in IT and worked at a startup, and that's why I made the transition to marketing because marketing has much better parties. But what's really interesting about the way data science is infiltrating marketing is the technology came in first with everything went digital. And now we're at a point where there's so much data and most marketers, they kind of got into marketing as sort of the arts and crafts field and are realizing now they need a very strong mathematical and statistical background. So one of the things that, reason why we're here and IBM is helping out tremendously is making a lot of the data more accessible to people who do not have a data science background and probably never will. Great. Okay, thank you. Let me introduce Ronald VanLoon. Ronald, your practice is really all about helping people extract value out of data, driving competitive advantage, business advantage, or organizational, you know, excellence. Tell us a little bit about yourself, your background, and your practice. Yeah, basically I have three different backgrounds on one hand, I'm a director of the data analytics consultancy firm called Adversightment, where we help companies to become data driven, mainly large companies. I'm an advisory board member at SimplyLearn, which is an e-learning platform, especially also for big data and analytics. And on the other hand, I'm a blogger and I host a series of webinars. Okay, great. Now, Dez Blanchfield, I met you on Twitter, you know, probably a couple of years ago, but we first really started to collaborate last year and we've spent a fair amount of time together. You are a data scientist, but you're also a jack-of-all-trades. You've got a technology background, you sit on a number of boards, you work very active with public policy, so tell us a little bit more about what you're doing these days, a little bit more about your background. Sure, I think my primary challenge these days is communication. Try to join the dots between my technical background and deeply technical pedigree to just plain English everyday language and business speak. So bridging that technical world with what's happening in the boardroom and toe-to-toe with the geeks to plain English, to execs and boards and just handhold them and steward them through the journey of the challenges that they're facing, whether it's the enormous rapid of change and the pace of change that's just almost exhaustive and causing them to sprint, but not just sprint in one race, but multiple lanes at the same time, as well as some of the really big things that are coming up that we've seen, like GDPR, so it's that communication challenge and just handholding people through that journey and that mix of technical and commercial experience. Great, thank you. And then finally, Joe Caserta, founder, president of Caserta Concepts. Joe, you're a practitioner, you're on the front lines, helping organizations, similar to Ronald, extract value from data, translate that into competitive advantage, but tell us a little bit about what you're doing these days in Caserta Concepts. Thanks, Dave, thanks for having me. Yeah, so Caserta's been around, I've been doing this for 30 years now and natural progressions have been just getting more from application development to data warehousing, to big data analytics, to data science, very, very organically, that's just because that's where businesses need the help the most over the years. And right now, the big focus is governance, at least in my world, trying to govern when you have a bunch of disparate data coming from a bunch of systems that you have no control over, like social media and third party, third party data systems, bringing it in and how do you organize it, how do you ingest it, how do you govern it, how do you keep it safe, and also help define ownership of the data within an organization, within an enterprise. That's also a very hot topic, which ties back into GDPR. Great, okay, so we're gonna be unpacking a lot of topics associated with the expertise that these individuals have. I'm gonna bring in Jim Kabilis to the conversation. Jim, the newest Wikibon analyst and the newest member of the SiliconANGLE media team. So Jim, get us started off. Yeah, so we're at an IBM event where machine learning and data science are at the heart of it. There are really three core themes here. Machine learning and data science, on the one hand. Unified governance on the other and hybrid data management. I want to circle back or focus on machine learning. Machine learning is the coin of the realm right now in all things data. Machine learning is the heart of AI. Machine learning, everybody is hiring data scientists to do machine learning. I want to get a sense from our panel, who are experts in this area, what are the chief innovations and trends right now in machine learning? And not deep learning, the core of machine learning. What's super hot? What's, you know, in terms of new techniques, new technologies, new ways of organizing teams to build and to train machine learning models? I'd like to open it up. Let's just start with Lillian. What are your thoughts about trends in machine learning? What's really hot? It's funny that you excluded deep learning from the response for this, because I think the hottest space in machine learning is deep learning and deep learning is machine learning. I see a lot of collaborative platforms coming out where people, data scientists are able to work together with other sorts of data professionals to reduce redundancies in workflows and create more efficient data science systems. Is there much uptake of these crowdsourcing environments for training machine learning models like CrowdFlower or Amazon Mechanical Turk or Mighty AI? Is that a huge trend in terms of the workflow of data science or machine learning? A lot of that? I don't see that crowdsourcing is like, okay, so maybe I've been out of the crowdsourcing space for a while, but I was working with Standby Task Force back in 2013 and we were doing a lot of crowdsourcing. And I haven't seen that the industry has been increasing, but I could be wrong. I mean, because there's no, if you're building automation models, most of the, a lot of the work that is being crowdsourced could actually be automated if someone took the time to just build the scripts and build the models. So I don't imagine that that's gonna be a trend that's increasing. Well, automation of the machine learning pipeline is fairly hot in terms of, I'm seeing more and more research, Google's doing a fair amount of automated machine learning. The panel, what do you think about automation in terms of the core modeling tasks involved in machine learning? Is that coming along? Are data scientists in danger of automating themselves out of a job? I don't think there's a risk of data scientists being put out of a job. Let's just put that on the thing. And I do think we need to get a bit clearer about this meme of a mythical unicorn. But to your core point around machine learning, I think when you see, we saw the cloud become baked into products just as a given. I think machine learning has already crossed this threshold. We just haven't necessarily noticed or caught up. And if we look at, you know, we're at an IBM event, so let's just do a call out for them. The data science experience platform, for example, machine learning's built into a whole range of things around algorithm and data classification. And there's an assisted guided model for how you get to certain steps where you don't actually have to understand how machine learning works. You don't have to understand how the algorithms work. It shows you the different options you've got and you can choose them. So you might choose regression and it'll give you different options on how to do that. So I think we've already crossed this threshold of baking in machine learning and baking in the data science tools. And we've seen that with cloud and other technologies where, you know, the Office 365 is now, you can't get a non-cloud Office 365 account, right? I think that's already happened in machine learning. What we're seeing, though, is organizations even as large as the Google's still in catch-up mode, in my view, on some of the shift that's taken place. So we've seen them write little games and apps where people do doodles and then it runs through the ML library and says, well, that's a cow or a unicorn or a duck and you get awards and gold coins and whatnot. But as long as, well, as far as 12 years ago, I was working on a project where we had full-size airplanes acting as drones and we mapped with 2D and 3D imagery, with 2D high-res imagery and LiDAR for 3D point clouds. We were finding poles and wires for utility companies using ML before it even became a trend and baking it right into the tools that end users saw on a webpage and clicked and pointed on. To kind of Lilian's point, it's not crowdsourcing but crowd-sharing that's really powering a lot of the rapid leaps forward. If you look at DSX from IBM or you look at Node-RED and then a huge number of free workflows that someone has probably already done the thing that you are trying to do, go out and find it in the libraries to Jupyter and R Notebooks. Chris, can you define before you go? This is great. Crowd-sourcing versus crowd-sharing. What's the distinction? Well, so crowd-sourcing, kind of where in the context of the question you're asking is like I'm looking for stuff that other people, getting people to do stuff for me. It's like asking people to mine classifies. Whereas crowd-sharing is, someone has done the thing already. It already exists. You're not purpose-built saying, Jim, help me build this thing. It's like, Jim, you already built this thing. Cool. Can I fork it and make my own from it? Okay, I see what you mean. Keep going. And then, again, going back to earlier, in terms of the advancements, the really deep learning, it probably is a good idea to sort of define these things like machine learning is how machines do things without being explicitly programmed to do them. Deep learning is like, if you can imagine a stack of pancakes, right? Each pancake is a type of machine learning algorithm and your date is the syrup. You pour the data on it, it goes from layer to layer to layer to layer and what you end up with at the end is breakfast. That's the easiest analogy for what deep learning is. Now imagine a stack of pancakes 500 or 1,000 high, that's where deep learning is going now. Sure, multi-layered machine learning models, essentially, that have the ability to do great higher levels of abstraction, like image analysis, Lillian. I had a comment to add about automation and data science. Because there are a lot of tools that are able to, or applications that are able to use data science algorithms and output results, but the reason that data scientists aren't in risk of losing their jobs is because just because you can get the result, you also have to be able to interpret it, which means you have to understand it and that involves deep math and statistical understanding, plus domain expertise. So, okay, great, you took out the coding element, but that doesn't mean you can codify a person's ability to understand and apply that insight. Right. Joe, you have something to add? Yeah, I could just add that. I see the trend. Really, the reason we're talking about it today is machine learning is not necessary. It's not new, like Dez was saying. But what's different is that it's the accessibility of it now. It's just so easily accessible. All of the tools that are coming out for data have machine learning built into it, right? So the machine learning algorithms, which used to be a black art years ago, now is just very easily accessible that you can get as part of everyone's toolbox. And the other reason that we're talking about it more is that data science is starting to become a core curriculum in higher education, which is something that's new, right? That didn't exist 10 years ago. But over the past five years, I'd say it's becoming more and more easily accessible for education. And now people understand it and now we have it accessible in our tool sets so now we can apply it. And I think those two things coming together is really making it becoming part of the standard of doing analytics. And I guess the last part is once we can train the machines to start doing the analytics, right? And get smarter as it ingests more data. And then we can actually take that and embed it in our applications. That's the part that you still need data scientists to create that. But once we can have standalone appliances that are intelligent, that's when we're going to start seeing really machine learning and artificial intelligence really start to take off even more. So I'd like to switch gears a little bit and bring Ronald to the discussion. Okay, yes, here you go. Ronald, the bromide in this big data world we live in is the data is the new oil. It's got to be a data-driven company and many other cliches. But when you talk to organizations and you start to peel the onion, you find that most companies really don't have a good way to connect data with business impact and business value. What are you seeing with your clients and just generally in the community with how companies are doing that? How should they do that? I mean, is that something that is a viable approach? You don't see accountants, for example, quantifying the value of data on a balance sheet. There's no standards for doing that. And so it's sort of this fuzzy concept. How are and how should organizations take advantage of data and turn it into value? So I think in general, if you look how companies look at data, they have departments and within the departments, they have specific tools specific for this department. And what you see is that there's no central, let's say data collection. There's no central management of governance. There's no central management of quality. There's no central management of security. Each department is managed their data on their own. So if you then ask on one hand, okay, how should they do it is basically go back to the drawing table and say, okay, how should we do it? We should collect centrally the data and we should take care for central governance. We should take care for central data quality. We should take care for centrally managing this data and look from a company perspective and not from a department perspective what the value of data is. So look at the perspective from your whole company. And this means that it has to be brought on one end to the awareness from C level where they most often still fail to understand what it really means and what the impact can be for the company. So it's a hard problem because data by its very nature is now so decentralized but Chris, you have- One thing I want to add to that is think about it in terms of valuing data, look at what it would cost you for a data breach. Like what is the expense of having your data compromised if you don't have governance, if you don't have policy in place. Look at the major breaches of the last couple years and just how many billions of dollars those companies lost in market value and trust and all that stuff. That's one way you can value your data very easily. What will it cost us if we mess this up? So a lot of CEOs will hear that and say, okay, I get it. I have to spend to protect myself but I'd like to make a little money off of this data thing. How do I do that? Well, I like to think of it that, I think data is definitely an asset with an organization and it's becoming more and more of an asset as the years go by. But data is still a raw material and that's the way I think about it. In order to actually get the value, just like if you're creating any product you start with raw materials and then you refine it and then it becomes a product. For data, data is a raw material, you need to refine it and then the insight is the product. And that's really where the value is and the insight is absolutely, you can monetize your insight. Data is abundant, insights are scarce. Well, actually you could say that intermediate between insights and the data are the models themselves, the statistical predictive and machine learning models that are a crystallization of insights that have been gained by people called data scientists. What are your thoughts on that? Are statistical predictive machine learning models something, an asset that companies, organizations should manage governance of on a centralized basis or not? Well, the models are essentially the refinery system. So as you're refining your data, you need to have process around how you exactly do that. Just like refining anything else, right? It needs to be controlled and it needs to be governed. And I think that data is no different from that. And I think it's very undisciplined right now in the market or in the industry. And I think maturing that discipline around data science I think is something that's gonna be a very high focus this year and next. You were mentioning how do you make money from data? Because there's all this risk associated with security breaches. But at the risk of sounding simplistic, you can generate revenue from system optimization or from developing products and services, using data to develop products and services that better meet the demands and requirements of your market so that you can sell more. So either you are using data to earn more money or you're using data to optimize your system so you have less cost. And that's a simple answer for how you're gonna be making money from the data. But yes, there is always the counter to that, which is the security risks. Well, my question really relates to, we talked to C-level executives that kind of think about running the business, growing the business and transforming the business. And a lot of times they can't fund these transformations. And so I would agree, there's many, many opportunities to monetize data, cut costs, increase revenue. But organizations seem to struggle to either make a business case and actually implement that transformational. Dave, I'd love to have a crack at that. I think this conversation epitomizes the type of things that are happening in boardrooms and C-sweets already. So we've really quickly dived into the detail of data and the detail of machine learning, the detail of data science without actually stopping and taking a breath and saying, well, we've got lots of it, but what have we got? Where is it? What's the value of it? Is there any value in it at all? And how much time and effort and money should we invest in it? For example, we talk about being about a resource. I look at the data as a utility. When I turn the tap on to get a drink of water, it's there as a utility. I counted it being there, but I don't always sample the quality of the water. And I probably should. It could have gyatter in it, right? What's interesting is I trust the water at home in Sydney because we have a fairly good experience with good quality water. If I were to go to some other nation, I probably wouldn't trust that water. And I think when you think about it, what's happening in organizations, it's almost the same as what we're seeing here today. We're having a lot of fun diving into the detail, but what we've forgotten to do is ask the question, why is data even important? What's the remit of the business? Why are we in business? What are we doing as an organization? And where does data fit into that as opposed to becoming so fixated on data because it's a media hype topic? And I think once you can wind that back of it and say, well, we have lots of data, but is it good data? Is it quality data? Where is it coming from? Is it ours? Are we allowed to have it? What treatment are we allowed to give that data? As you said, are we controlling it? And where are we controlling it? Who owns it? There's so many questions to be asked, but the first question I like to ask people in plain English is, well, is there any value in data in the first place? What decisions are you making that data can help drive? What things are in your organizations, KPIs and milestones you're trying to meet that data might be a support? So then instead of becoming fixated with data as a thing in itself, it becomes part of your DNA. Does that make sense? Think about what money means, right? The economists rhyme, money is a measure, is systems for a medium, a measure, an exchange in a store. So it's a medium exchange, a measure value, a way to exchange something and a way to store value. Data, good, clean data, well-governed fits all four of those. So if you're trying to figure out how do we make money out of the stuff, figure out how money works and then figure out how you map data to it. So if we approach and we start with a company, we always start with a business case, which is quite clear and define the use case basically. Start with a team on one hand, marketing people, sales people, operational people and also the whole data science team. So start with this case, it's like defining basically a movie. If you want to create a movie, you know where you're going to, you know what you want to achieve to create the customer experience. And this is basically the same with a business case where you define this is the case and this is how we're going to derive value, start with it and deliver something within a month. And after the month, you check, okay, where are we and how can we move forward and what's the value that we've brought. Yeah, now, and I as well start with business, I've done thousands of business cases in my life with organizations and unless that organization was kind of a data broker, the business case rarely has a discrete component around data, is that changing in your experience? Yeah, so we guide companies in to be data driven. So initially, indeed, they don't like to use the data, they don't like to use the analysis. So that's why, how we help and is it changing? Yes, they understand that they need to change, but changing people is not always easy. So you see it's hard if you're not involved and you're not guiding it, they fall back in doing the daily task. So it's changing, but it's a hard change. Well, that's where this common parlance comes in and Nilya, and you sort of, this is what you do for a living is helping people understand these things. Dez, you've been sort of evangelizing that common parlance, but do you have anything to add? Yeah, I just, I wanted to add that for organizational implementations, another key component to success is to start small, to start in one small line of business and then when you've mastered that area and made it successful, then try and deploy it on more areas of the business. And as far as initializing a big data implementation, that's generally how to do it successfully. There's the whole issue of putting a value on data as a discrete asset. Then there's the issue of, how do you put a value on a data lake? Because a data lake is essentially an asset you build on spec that it's an exploratory archive, essentially of all kinds of data that might yield some insights, but you have to have a team of data scientists doing exploration and modeling to be able, but it's all on spec. How do you put a value on a data lake? And at what point does the data lake itself become a burden? Because you've got to store that data and manage it. At what point do you drain that lake? What point do the costs of maintaining that lake outweigh the opportunity costs of not holding onto it? So each Hadoop node is approximately $20,000 per year a cost for storage. So I think that there needs to be a test and a diagnostic before even ingesting the data and storing it. Is this actually gonna be useful? What value do we plan to create from this? Because really you can't store all the data and it's a lot cheaper to store data in Hadoop than it was in traditional systems, but it's definitely not free. So people need to be applying this test before even ingesting the data. Why do we need this? What business value? I think the question we need to also ask around this is why are we building data lakes in the first place? So what's the function it's gonna perform for you? There's been a huge drive to this idea we need a data lake, we need to put it all somewhere, but invariably they become data swamps, right? And we only half jokingly say that because I've seen 90 day projects turn from a great idea to a really bad nightmare. And as Lillian said, it is cheaper in some ways to put it into a HDFS platform in a technical sense, but when we look at all the fully burdened components, it's actually more expensive to find Hadoop specialist and Spark specialist to maintain that cluster. And invariably I'm finding that big data, quote unquote, is actually not so much lots of data, it's complex data. And as Lillian said, you don't always need to store it at all. So I think if we go back to the question of what's the function of a data lake in the first place? Why are we building one? And then start to build some fully burdened cost components around that. We'll quickly find that we don't actually need a data lake per se, we just need an interim data store. So we might take last year's data and tokenize it, anonymize it and do some analytics on it and just keep the metadata. And so I think this is rushed for a whole range of reasons, particularly vendor driven to build data lakes because we think they're a necessity when in reality they may just be an interim requirement and we don't need to keep them for a long term. I'm going to attempt to, the last few questions put them all together. And I think they all belong together because one of the reasons why there's such hesitation about progress within the data world is because there's just so much accumulated tech debt already where there's a new idea, we go out and we build it and six months, three years, it really depends on how big the idea is. Millions of dollars is spent and then by the time things are built, the idea is pretty much obsolete. No one really cares anymore. And I think what's exciting now is that the speed to value is just so much faster than it's ever been before. And I think what makes that possible is this concept of, I don't think of a data lake as a thing, I think of a data lake as an ecosystem. And that ecosystem has evolved so much more, probably in the last three years than it has in the past 30 years. And it's exciting times because now, once we have this ecosystem in place, if we have a new idea, we can actually do it in minutes, not years. And that's really the exciting part. And I think data lake versus the data swamp comes back to just traditional data architecture. And if you architect your data lake right, you're going to have something that's substantial that you're going to be able to harness and grow. If you don't do it right, if you just throw data, if you buy a Hadoop cluster or a cloud platform and just throw your data out there and say, we have a lake now, yeah, you're going to create a mess. And I think taking the time to really understand the new paradigm of data architecture and modern data engineering and actually doing it in a very disciplined way, if you think about it, what we're doing is we're building laboratories. And if you have a shabby poorly built laboratory, the best scientist in the world isn't going to be able to prove his theories. So if you have a well-built laboratory and a clean room, then a scientist can get what he needs done very, very, very efficiently. And that's the goal, I think, of data management today. I'd like to just quickly add that I totally agree with the challenge between on-premise and cloud and whatnot. I think one of the strong themes of today is going to be the hybrid data management challenge. And I think organizations, some organizations have rushed to adopt cloud and thinking it's a really good place to dump the data in someone else's to manage the problem. And then they've ended up with a very expensive death by 1,000 cuts in some senses. And then others have been very reluctant and as a result of not gotten access to rapid moving and disruptive technology. So I think there's a really big challenge to get a basic conversation going around what's the value of using cloud technologies and adopting it versus where the risks and when's the right time to move? For example, should we cloud-verse for workloads? Should we move, hold data sets in there? Moving half a petabyte of data into a cloud platform back is a non-trivial exercise. But moving a terabyte isn't actually that big a deal anymore. So, should we keep stuff behind the firewalls? I've been saying this week where 80% of the data supposedly is and just push out for cloud tools, machine learning, data science tools, whatever they might be, cognitive analytics, et cetera, and keep the bulk of the data on-premise or should we just move whole bolts into the cloud? And there is no one-sites fits all. There's no silver bullet. Every organization has its own quirks and own nuances they need to think through and make a decision for themselves. Organizations have zonal architectures, so you'll have a data lake that consists of a NoSQL platform that might be used for, say, mobile applications, a Hadoop platform that might be used for unstructured data refinement, so forth, a streaming platform, so forth and so on, and then you'll have machine learning models that are built and optimized for those different platforms. So, think of it in terms of then, your data lake is a set of zones that... It gets even more complex just playing on that theme when you think about what Cisco started called fold computing. I don't really like that term, but edge analytics or computing at the edge, we've seen with the internet coming along where we couldn't deliver everything from a central data center, so we started creating this concept of content delivery networks, right? I think the same thing, well, I know the same thing has happened in data analysis and data processing where we've been pulling social media out of the cloud, per se, and bringing it back to a central source and doing an analytics on it, but when you think of something like, say, for example, when the Dreamliner 787 from Boeing came out, this airplane created half a terabyte of data per flight. Now, let's just do some quick back-of-the-envelope math. There's 87,400 flights a day just in the domestic airspace in USA alone per day. Now, 87,400 by half a terabyte, that's 43.5 petabytes a day, you physically can't copy that from, you know, quote-unquote in the cloud, if you'll pardon the pun, back to a data center. So, now we've got the challenger, a lot of our enterprise data is behind a firewall, supposedly 80% of it, but what's out at the edge of the network? Where's the value in that data? So, as you said, there are zonal challenges now. What do I do with my enterprise versus the open data, the mobile data, the machine data? Yeah, we've seen some recent data from IDC that says about 43% of the data is gonna stay at the edge. We think that that's way understated. I mean, just given the examples, we think it's closer to 90% is gonna stay at the edge. Well, just on the airplane topic, right? So, Airbus wasn't gonna be outdone. Boeing put 4,000 sensors or something in the 787 Dreamliner six years ago. Airbus just announced an A381,000 with 10,000 sensors in it. So, if you do the same math, now the FAA in the US said that all aircraft and all carriers have to be by early next year. Things like March or April next year have to be at the same level of BIOS or the same capability of data collection and so forth. It's kind of like a mini GDPR for airlines. So, with the A381,000 with 10,000 sensors, that becomes 2.5 terabytes per flight. If you do the math, it's 220 petabytes of data just in one day's traffic domestically in the US. Now, it's just so mind-boggling that we're gonna have to completely turn our thinking on its head on what do we do to find the firewall? What do we do in the cloud versus what we might have to do in the airplane? I mean, think about edge analytics in the airplane, processing data, as you said, Jim, stream analytics in flight. Yeah, that's a big topic in Wikibon. So, within the team, me and David Floyer and my other colleagues are talking about the whole notion of an edge architecture. Not only will most of the data be persisted at the edge, most of the deep learning models, like TensorFlow, will be executed at the edge. To some degree, the training of those models will happen in the cloud, but much of that will be pushed in a federated fashion to the edge where, at least I'm predicting, and we're already seeing some industry moves in that direction in terms of architectures. Google has a federated training project or initiative. Look at TensorFlow Lite. Which is really fascinating for it's geared to IoT. I'm sorry, go ahead. Yeah, look at TensorFlow Lite. I mean, the announcement of having every Android device having ML capabilities is Google's essential acknowledgement. We can't do it all. So we need to essentially, sort of like a SETI at home, everyone's smartphone and set-top TV box just to help with the processing. We're talking about this sort of leads to this IoT discussion, but I want to underscore the operating model. As you were saying, you can't just lift and shift to the cloud. You're not going to, CEOs aren't going to get the billion dollar hit. By just doing that. So you've got to change the operating model. And that leads to this discussion of IoT and an entirely new operating model. Well, there are companies that are like Sysense who have worked with Intel, and they've taken this concept of taking the business logic and not just putting it in the chip, but actually putting it in memory in the chip. So as data is going through the chip, it's not just actually being processed, but it's actually being baked in memory. So level one, two, and three cache. Now, this is a game changer, because as Chris was saying, even if we were to get the data back to a central location, the compute load, I mean, I saw a really interesting thing from, I think it was Google the other day, one of the guys was doing a talk, and he spoke about what it meant to add cognitive and voice processing into just the Android platform. And they used some number, like they had to double the amount of compute they had, just to add voice for free to the Android platform. Now, even for Google, that's a non-trivial exercise. So as Chris was saying, I think we have to, again, flip it on its head and say, how much can we put at the edge of the network? Because think about these phones, think about the, I mean, even your fidget microwave, right? We put man on the moon with something that these days we make for $89 at home on the Raspberry Pi computer, right? And even that was a thousand times more powerful. When we start looking at what's going into the chips, we've seen people build new, not even GPUs, but deep learning and stream analytics capable chips, like Google, for example, that's gonna make its way into consumer products so that now the compute capacity in phones is gonna, I think, transmogrify in some ways, because there is the magic in there to the point where, as Chris was saying, we're gonna have the smarts in our phone and a lot of that workload's gonna move closer to us and only the metadata that we need to move is going to go centrally. Well, here's the thing. The edge isn't the technology, the edge is actually the people. When you look at, for example, the MIT language Scratch, this kid's programming language, it's drag and drop, you know, kids can assemble really fun animations and they'll make little movies. We're training them to build for IoT because if you look at a system like Node-RED, it's an IBM interface that is drag and drop. You assemble your workflows for IoT and you can push that to a device. You can scratch has a converter for Arduino. So the edge is what there's thousands and millions of kids who are learning how to code, who are learning how to think architecturally and algorithmically. What they're going to create that is beyond what any of us can possibly imagine. I'd like to add one other thing as well. I think there's a topic we've gotta start tabling and that is what I refer to as the gravity of data. So when you think about how planets are formed, right? You know, particles of dust to create, they form into planets, planets develop gravity and the reason we're not flying into space right now is that there's gravitational force, even though it's one of the weakest forces, it keeps us on our feet. Oftentimes in organizations, I ask them to start thinking about where's the center of your universe with regard to the gravity of your data? Because if you can follow the center of your universe and the gravity of your data, you can often, as Chris was saying, find where the business logic needs to be. And it could be that you've gotta think about a storage problem, you can think about a compute problem, you can think about stream analytics problem. But if you can find where the center of your universe and the center of the gravity for your data is, often you can get a really good insight into where you need to start focusing on where the workloads are gonna be, where the smarts are gonna be, whether it's small, medium, or large. But this brings up the topic of data governance. One of the big things that, one of the themes here at FastTrack Your Data is GDPR, what it means. That's one of the reasons I think IBM selected, you know, Europe, generally, Munich specifically. So let's talk about GDPR. We had a really interesting discussion last night. So let's kinda recreate some of that. I'd like somebody on the panel to start with, what is GDPR and why does it matter? Ronald? Yes, so maybe I can start. Maybe a little bit more in general, unified governance. So if I talk to companies and they need to explain them what's governance, I basically compare it with a crime scene. So in a crime scene, if there's something happened, they start with securing all the evidence. So they start sealing the environment and take care that all the evidence is collected. And on the other hand, you see that they need to protect this evidence. There are all kinds of policies, there are all kinds of procedures, there's all kinds of rules that needs to be followed to take care that the whole evidence is secured well. And once you start basically investigating, so you have the crime scene investigators, you have the research lab, you have all different kinds of people, they need to have consent before they can use all this evidence. And the whole reason why they're doing this is on one hand to collect the villain, to collect the crook, to catch him, and on the other hand, once he's there to convict him. And we do this to have trust in the material, so trust basically in the analytics, and on the other hand to have the public have trust in everything that's happened with the data. So if you look to a company where data is basically the evidence, this is the value of your data. It's similar to what you have the evidence within the crime scene, but most companies don't treat it like this. So if we then look to GDPR, GDPR basically shifts the power and the ownership of the data from the company to the person that creates it, which is often, let's say, the consumer. And there's a lot of, let's say, paradox in this because all the companies say we need to have this customer data because we need to improve the customer experience. So if you make it concrete, and let's say it's first of June, so GDPR is active, and it's first of June 2018, and I go to iTunes, so I use iTunes, and let's go to iTunes and say, okay, Apple, please give me access to my data. I want to see which kind of personal information you have stored for me. On the other end, I want to have the rights to rectify all this data. I want to be able to change it and give them a different level how they can use my data. So I ask this to iTunes, and then I say to them, okay, I basically don't like you anymore. I want to go to Spotify, so please transfer all my personal data to Spotify. So that's possible once it's June 18. Then I go back to iTunes and say, okay, I don't like it anymore, please reduce my consent. I redraw my consent, and I want you to remove all my personal data for everything that you use, and I go to Spotify and I give them, let's say, consent for using my data. So this is a shift where you can, as a person, be the owner of the data, and this has a lot of consequences, of course, for organizations how to manage this. So it's quite simple for the consumer, they get the power, it's maturing the whole law system, but it's a big consequence, of course, for organizations. This is going to be a nightmare for marketers, but so fill in some of the gaps there. Let's go back. So GDPR, the General Data Protection Regulation, was passed by the EU in 2016, in May of 2016. It is, as Ronald was saying, it's four basic things, the right to privacy, the right to be forgotten, the privacy built into systems by default, and the right to data transfer. It takes effect next year. It is already in effect. GDPR took effect in May of 2016. The enforcement penalties take place the 25th of May, 2018. Now here's where there's two things on the penalty side that are important for everyone to know. Number one, GDPR is extraterritorial, which means that an EU citizen anywhere on the planet has GDPR goes with them. So say you're a pizza shop in Nebraska, and an EU citizen walks in, orders a pizza, gives them the credit card, and stuff like that. If you, for some reason, store that data, GDPR now applies to you, Mr. Pizza Shop, whether or not you do business in the EU because an EU citizen's data is with you. Two, the penalties are much stiffer than they ever have been. In the old days, companies could simply write off penalties as saying that's the cost of doing business. With GDPR, the penalties are up to 4% of your annual revenue or 20 million euros, whichever is greater, and there may be criminal sanctions against the charges against key company executives. So there's a lot of questions about how this is going to be implemented, but one of the first impacts you'll see from a marketing perspective is all the advertising we do targeting people by their age, by their personal identity, by the information, by their demographics. Between now and May 25th, 2018, a good chunk of that may have to go away because we may not, there's no way for you to say, well, this person's EU citizen, this person's not. People give false information all the time online. So how do you differentiate? Every company, regardless of whether they're in the EU or not, will have to adapt to it or deal with the penalties. I know Lillian. So Lillian, as a consumer, this is designed to protect you, but you had a very negative perception of this regulation. Yeah, I've looked over the GDPR, and to me it actually looks like a socialist agenda. It looks like, no, it looks like a full assault on free enterprise and capitalism, and on its face from a legal perspective, it's completely and wholly unenforceable because they're assigning jurisdictional rights to the citizen, but what are they gonna do? They're gonna go to Nebraska and they're gonna call in the guy from the pizza shop and call them into what court? The EU court, it's unenforceable from a legal perspective, and if you write a law that's unenforceable, it's gotta be enforceable in every element. It can't be just, oh, we're only gonna enforce it for Facebook and for Google, but it's not enforceable for, it needs to be written so that it's a complete and actionable law, and it's not written in that way, and from a technological perspective. It's also, it's not implementable. I think you said something like 652 EU regulators or political people voted for this and 10 voted against it, but what do they know about actually implementing it? Is it possible? There's all sorts of regulations out there that aren't possible to implement, like come from an environmental engineering background, and it's absolutely ridiculous because these agencies will pass laws that actually, it's not possible to implement those in practice. The cost would be too great, and it's not even needed. So I don't know, I just saw this and I thought, if the EU wants to, what they're essentially trying to do is regulate what the rest of the world does on the internet, and if they wanna build their own internet, like China has, and police it the way that they want to, but Ronald here made an analogy between data and free enterprise and a crime scene. Now to me, that's absolutely ridiculous. What does data and someone signing up for an email list have to do with a crime scene, and if EU wants to make it that way, then they can police their own internet, but they can't go across the world. They can't go to Singapore and tell Singapore, or go to the pizza shop in Nebraska and tell them how to run their business. You know, EU overreach, in the proposed Brexit era, what you're saying has a lot of validity. How far can the tentacles of the EU reach into other sovereign nations? What quarter of an a call them into? Yeah, yeah. I'd like to weigh in on this. There are lots of unknowns, right? So I'd like us to focus on the things we do know. We've already dealt with similar situations before. In Australia, we introduced the goods and sales tax. Completely foreign concept, everything you bought had 10% on it. No one knew how to deal with this. It was a completely new practice in accounting. There's a whole bunch of new software had to be written. MYOB had to have new capability, but we coped. No one actually went to jail yet, and it's decades later, for not complying with GST. So what it was is a framework on how to shift from non-sales tax-related revenue collection to sales tax-related revenue collection. I agree that there are some egregious things built into this. I don't disagree with that at all. But I think if I put my slightly broader view of the world hat on, we have well and truly gone past the point in my mind where data was respected, data was treated in a sensible way. I get emails from companies I've never done business with. And when I followed up, it's because I did business with a credit card company that gave it to a service provider that thought that I was going to, when I bought a holiday to come to Europe, that I might want travel insurance. Now, some might say there's value in that, and others say there's not, there's a debate. But let's just focus on what we're talking about. We're talking about a framework for governance of the treatment of data. If we remove all the emotive component, what we're talking about is a series of guidelines backed by laws that say that we would like you to do this in an ideal world, but I don't think anyone's gonna go to jail on day one. They may go to jail on day 180 if they continue to do nothing about it. So they're asking you to sort of sit up and pay attention, do something about it. There's a whole bunch of relief around how you approach it. But it's also, I mean, the big thing for me is there's no get out jail card, right? There is no get out of jail card for not complying. But there's plenty of support. I mean, we're gonna have ambulance changes everywhere. We're gonna have class actions. We're gonna have individual suits. I mean, the greatest thing to do right now is get into GDPR law, because it's, you know, do you think data scientists are unicorn? Well, I think we've seen ad blocking. I use ad blocking as an example, right? A lot of organizations with advertising broke the internet. We're just throwing too much content on pages to the point where they're just unusable. And so we had this response with ad blocking. I think in many ways GDPR is a regional response to a situation where I don't think it's the exact right answer, but it's the next evolution step. And I think we'll see things evolve over time. It's funny you mentioned that, because in the United States, one of the things that has happened is that with the change in political administrations, the regulations on what companies can do with your data have actually been lax, and to the point where, for example, your internet service provider can resell your browsing history with or without your consent or your consent's probably buried in there on page 47. And so, yeah, GDPR is kind of a response to saying, you know what, you guys over there across the Atlantic are kind of doing some fairly irresponsible things with what you allow companies to do. Now the impact, to Lilian's point, no one's probably going to go after the pizza shop in Nebraska because they don't do business in the EU. They don't have an EU presence. And it's unlikely that an EU regulator is gonna get on a plane from Brussels and fly to Topeka and say, or Omaha, sorry, and say, come on Joe, let's get the pizza shop in order here. But for companies, particularly cloud companies that have offices and operations within the EU, they have to set up and pay attention. So if you have any kind of EU operations or any kind of fiscal presence in the EU, you need to get on board. But to Lilian's point, then it becomes a boondoggle for lawyers in the EU who wanna go after depocketed companies like Facebook and Google. What's the value in that? It seems like regulators are just trying to create work for themselves. What about the things that advertisers can do, not so much with the data that they have, but with the data that they don't have. In other words, they have people called data scientists to build models that can do inferences on sparse data and do amazing things in terms of personalization. What do you do about all those gray areas where you got machine learning models and so forth? But it applies to the universe. Well, so yeah, it applies to personally identifiable information. But if you have a talented enough data scientist, you don't need the PII or even the inferred characteristics, if a certain type of behavior happens on your website, for example, and this path of 17 pages almost always leads to a conversion, it doesn't matter who you are or where you're coming from, if you're a good enough data scientist, you can build a model that will track that. Target inferred that some young woman was pregnant and they inferred correctly, even though that was never devolved. I mean, there's all those gray areas that how can you stop that slippery slope? Well, I'm going to weigh in really quickly. A really interesting experiment for people to do. When people get very emotional about it, I said to them, go to Google.com, view source, put it in 7-point career font in Word and count how many pages it is. I bet you can't guess how many pages. It's 52 pages of 7-point career font HTML to render one logo, a search field, and a click button. Now, why do we need 52 pages of HTML source code in JavaScript just to take a search query? Think about what's being done in that. It's effectively a mini operating system to figure out who you are and what you're doing and where you've been. Now, is that a good or bad thing? I don't know. I'm not going to make a judgment call. But what I'm saying is that we need to stop and take a deep breath and say, does anybody need a 52-page homepage to take a search query? Because that's just the tip of the iceberg. I mean, to that point, I like the results that Google gives me. That's why I use Google and not Bing because I get better search results. Yeah, I don't mind if you mind my personal data and give me our Facebook ads. Those are the only ads. I saw in your article that GDPR is going to take out targeted advertising. The only ads in the entire world that I like are Facebook ads because I actually see products I'm interested in and I'm happy to learn about them. I think, oh, I want to research that. I want to see this new line of products and what are their competitors. And I like the targeted advertising. I like the targeted search results because it's giving me more of the information that I'm actually interested in. I think that's exactly what it's about. You can still decide yourself if you want to have this targeted advertising. If not, then you don't give consent. If you like it, you give consent. So if a company gives you value, you give consent back. So it's not that it's restricting everything. It's giving consent. And I think it's similar to what happened in the same type of response. What happened to the Metcow disease here in Europe where you had the whole food chain that needs to be tracked. And everybody said, no, it's not required. But now it's implemented. Everybody in Europe does it. So it's the same what's probably going to happen over here as well. So what does GDPR mean for data scientists? So I think GDPR is, I think it is needed. I think one of the things that are slow, that may be slowing data science down is fear. People are afraid to share their data because they don't know what's going to be done with it. If there are some guidelines around it that should be enforced. And I think it's been said, but as long as a company could prove that it's doing the diligence to protect your data, I think no one is going to go to jail. I think when there's, we reference the crime scene, if there's a heinous crime being committed, then it's going to become obvious and then you do go directly to jail. But I think having guidelines and even laws around privacy and protection of data is not necessarily a bad thing. You can do a lot of data, really meaningful data science without understanding that it's Joe Cicerta, right? All of the demographics about me, all the characteristics about me as a human being, I think are still on the table. What all that they're saying is that you can't go after Joe himself directly. And I think that's okay. You know, there's still a lot of things. We could still cure diseases without knowing that I'm Joe Cicerta, right? As long as you know everything else about me. And I think that's really at the core, that's what we're trying to do. We're just trying to protect the individual and the individual's data about themselves. But I think as far as how it affects data science, I, you know, a lot of our clients, they're afraid to implement things because they don't exactly understand what the guideline is and they don't want to go to jail, so they wind up doing nothing. So now that we have something in writing that at least is something that we can work towards, I think is a good thing. In many ways organizations are suffering from the deer and the headlight problem, right? They don't understand it. And so they just end up frozen in the headlines. But I just want to go back one step if I could. We can get really excited about what it is and is not. But for me, the most critical thing is to remember data breaches are happening. There are over 1400 data breaches on average per day. And most of them are not trivial. And when we saw half a billion from Yahoo, and then 1.1 billion and then 1.5 billion. I mean, think about what that actually means. There were 47,500 MongoDBs breached in an 18-hour window after an automated upgrade. And there were airlines, there were banks, so police stations, there were hospitals. So when I think about frameworks like GDPR, I'm less worried about whether I'm gonna see ads and be sold stuff and stuff that I'm more worried about. And I'll give you one example. My 12-year-old son has an account on a platform called Edmodo. Now I'm not gonna pick on that brand for any reason, but it's a current issue. Something like, I think it was like 19 million children in the world had their username, password, email address, home address, and all this social interaction on this Facebook for Kids platform called Edmodo breached in one night. Now I got my hands on a copy. And everything about my son is there, okay? Now I have a major issue with that. Because I can't do anything to undo that, nothing. The fact that I was able to get a copy within hours on a dark website for free, the fact that his first name, last name, email, mobile phone number, all of these personal messages from friends, nobody has the right to allow that to breach on my son or your children or our children. For me, GDPR is a framework for us to try and behave better about really big issues. Whether it's a socialist issue, whether someone's got an issue with advertising, I'm actually not interested in that at all. What I'm interested in is, companies need to behave much better about the treatment of data when it's the type of data that's being breached. And I get really emotional when it's my son or someone else's child, because I don't care if my bank account gets hacked because they hedge that. They underwrite and insure themselves and the money rise back in my bank. But when it's my wife who donated blood and a blood donor website got breached and her details got lost, even things like sexual preferences that they ask questions on is out there. My 12 year old son is out there. Nobody has the right to allow that to happen. For me, GDPR is a framework for us to focus on that. Yeah, I think that security concerns are 100% and definitely a serious issue. Security needs to be addressed. And I think a lot of the stuff that's happening is due to, I think we need better security personnel. I think we need better people working in the security area where they're actually looking and securing because I don't think you can regulate. I was just, I wanted to take the microphone back when you were talking about taking someone to jail. Okay, I have a background in law. And if you look at this, you guys are calling it a framework, but it's not a framework. What they're trying to do is take 4% of your business revenues per infraction, they wanna say if a person signs up on your email list and you didn't necessarily give whatever disclaimer that the EU said you need to give, per infraction, we're gonna take 4% of your business revenue. That's a law that they're trying to put into place. And you guys are talking about taking people to jail. What jail, EU is not a country. What jurisdiction do they have? Like you're gonna take Pizza Man Joe and put them in the EU jail. Is there an EU jail? You're gonna take them to a UN jail? I mean, it's just on its face. It doesn't hold up to legal tests. I don't understand how they could enforce this. I'd like to just answer the first question. Security is a serious issue. I would be extremely upset if I were you. I personally know people who work for companies that have data breaches. And I respect them all. They're really smart people. They've got 25 plus years in security. And they are shocked that they've allowed a breach to take place. What they've invariably all agreed on is that a whole range of drivers have caused them to get to a bad practice. So for example, the donate blood website, the young person who was a CIS admin with all the right skills and all the right experience just made a basic mistake. They took a DB dump of a MySQL database before they're upgraded to WordPress website for the business. And they happened to leave it in a folder that was indexable by Google. And so somebody wrote a regular expression to search in Google to find SQL backups. Now, this person, I personally respect them. I think they're an amazing practitioner. They just made a mistake. So what does that bring us back to? It brings us back to the point that we need a safety net or a framework or whatever you want to call it where organizations have checks and balances no matter what they do. Whether it's an upgrade, a backup, a modification, and they all think they do. But invariably, we've seen from the hundreds of thousands of breaches they don't. Now, on the point of law, we could debate that all day. I mean, the EU does have a remit. And if I was caught speeding in Germany as an Australian, I would be thrown into a German jail. If I got caught as an organization in France breaching GDPR, I would be held accountable to the law in that region by the organization pursuing me. So I think it's a bit of a misnomer saying that I can't get an EU jail. I don't disagree with you totally, but I think it's regional. So if I get a speeding fine and break the law driving fast in EU, it's in the country, in the region that I'm caught. And I think GDPR is going to be enforced in that same approach. Unfortunately, the 60 minutes flew right by and it does when you have great guests like yourselves. So thank you very much for joining this panel today. And we have an action-packed day here. So we're going to cut over the cube. It's going to have its interview format starting in about a half hour. And then we cut over to the main tent. Who's on the main tent? As you're doing a main stage presentation today, Data Science is a team sport. Hillary Mason has a breakout session. We also have a breakout session on GDPR and what it means for you. Are you ready for GDPR? Check out ibmgo.com. It's all free content. It's all open. You do have to sign in to see the Hillary Mason and the GDPR sessions. And we'll be back in about a half hour with the cube. We're running replays all day on siliconangle.tv and also ibmgo. So thanks for watching everybody. Keep it right there. We'll be back in about a half hour with cube interviews. We're live from Munich, Germany at Fast Track Your Data. This is Dave Vellante with Jim Kobielus. We'll see you shortly.