 Live from the Galvanized Campus in San Francisco, it's theCUBE covering Apache SparkMaker Community Event brought to you by IBM. Now, here are your hosts, John Walls and George Gilbert. And welcome back to San Francisco here on theCUBE, along with George Gilbert. I'm John Walls, as we continue our coverage on theCUBE at the Apache SparkMaker Community Event sponsored by IBM. We're here on the Galvanized Campus down in the basement. Lovely dakes, as a matter of fact, and a very cool facility. With us is Ritika Gunner, who is the Vice President of Offering Management, Data and Analytics at IBM Analytics. Ritika, thanks for being with us. Nice to be here. And Dean Wampler, who is the architect for FAST Data in the Office of the CIO at Lightben. And Dean, thank you for being with us here. So Ritika, you've got a big day, big announcement coming up we understand, and I don't know how much you can share with us, but we understand you're gonna make some news today. We are absolutely gonna make some news. What a wonderful place to be at Galvanized, because if you take a look at it, one year ago, pretty much to the day, we were here and we announced our commitment to the Apache Spark community. And given the progress that we've made in the past year, and we'll talk a lot about that at tonight's event, we're gonna continue that momentum and put that into overdrive. We're gonna make a series of announcements later this evening, and I will give you a preview of some of that. We're gonna continue to contribute to the analytics operating system as we coined it in terms of Apache Spark and really help the community embrace Apache Spark all over. And we are going to take it up a notch. And in addition to what we've said about the analytics operating system, what do you think every operating system needs? Applications. Applications. We are gonna facilitate the need to be able to create applications on that operating system and introduce for the analytics environment and integrated development environment to be able to build applications and do that in a way that's very open and based on open source and facilitate all types of programming and languages and the community all around it. So that's in essence what we're gonna talk about tonight. So what's the shift there? What's the paradigm shift? I mean, you're looking at it. You're talking about a different kind of application or a different kind of process that's going to be related to the application. You know, we believe that our clients are infusing data all throughout their organizations. They're building data-driven cultures. And in order to be able to create a very data-driven culture, that means more people in the organization need to have access to data. And as more people in the organization have access, they need to be able to very easily be able to create insights from that and operationalize them. So the example I like to give is if a data science organization starts with, let's say, 1,000 hypotheses and they want to be able to rapidly test those 1,000 hypotheses, they may find that only 10 of them have any relevance whatsoever. Of those 10, they may wanna take one of them and be able to operationalize that into production in a way that can scale out that is fault tolerant that has all the set of capabilities it needs. And so the belief is that today's applications don't just need to be reactive, but they need to be insightfully reactive. And that's what we believe, the partnership with LightBend brings to bear. It's the notion of not only creating a set of reactive applications, which is what LightBend is about, but about creating insightful reactive applications. And that's what we can do together. So Dean, you're at App Skies, right? That's what you do. What is this shift, the data science experience, what does that mean to you and the work that you're gonna be doing? So LightBend has always been about middleware tools to build microservices, pull web applications and so forth. And then a few years ago, we got into the fast data space through Spark. Basically, they brought me on to do this and so I've been working in like the Hadoop environment. And what I think you're seeing that goes along with what you said is that there has been this divide for a long time between the enterprise and the sort of classic IT things they build and the people doing data analytics. But inevitably, it's all gonna have to be integrated because really anything we do, whether it's serving up ads or doing e-commerce, it's all data in the end. And the more we can bring intelligence from that data back into the applications and you actually drive what the user is experiencing, I think that's really crucial. So what we've tried to do is enable people to have the tools to build these applications that not only meet the requirements that you're trying to build but also just stay up, they respond to failure, they scale up gracefully and down gracefully and all of that, which is really what reactive is all about. We found it's just a great synergy working with these guys to bring our sort of shared vision to reality. So this sort of dovetails with what Derek Schell was telling us about, where we're orchestrating a platform of data management or analytic products, catalog of analytic feeds potentially or data feeds. And now you're telling us about the application development tools and maybe the application themselves as endpoints. There are a lot of different roles still. What do those roles look like? So, we spent a lot of time interviewing hundreds of data professionals and through that process, what we found is that there are four main people or personas that are really involved in driving a data-centric culture. We're starting with the data scientists and that's what we're announcing today. But in addition to that, we believe that providing a culture of data means that it's a team sport. And I know Derek alluded to that. I thought I heard him say that. It is a team sport and that is not just for the data scientist but also for the data engineer, for the business analyst and of course for the application developer. And so those are the four primary personas or people that we see. And you know, in some organizations, people may be playing multiple roles. And so this notion of being able to have not just collaboration between the data scientists themselves but across the set of professionals is something that is a fabric of what Derek was talking about in our next generation data and analytics technologies. So as a follow-up, we've always, you know, going back to sort of Adam and Eve in terms of computing, we've always had this trade-off between specialization and integration. What will be the, how will you surface and explain the benefits of the integration between these four personas? So the way I like to describe it is it's built from, it is an experience for me but built for us. And that is a very important premise that each one of those personas that I talked about needs an experience that is very specific to how they consume and process data. You know, the way a business analyst would do it is very different than an application developer. But yet it needs to work and be built for us so that together it is more than the sum of the individual parts. Would notebooks as computational documents have sort of taken off to the point where it seems like they're responsible almost single-handedly for driving GitHub, you know, project numbers. Over 200,000 notebooks in GitHub alone. Yes, okay. Would notebooks be an environment that's malleable enough to provide the different experiences for the different roles? Well, it is definitely for the data scientists. And if you look at a lot of what we're gonna announce today with the data science experience, a lot of that is founded on notebook technology. So what we're announcing first is Jupyter technology but founded on a very open premise that you can bring the tools that you have in there. So you'll notice and what we announced tonight that we have our studio integration. We will have Zeppelin notebook integration. And so the premise of having that environment for a data scientist makes sense. That may not make sense, for example, for a business analyst. Or it may not make sense for a developer. And so making sure that they have their own IDEs in a way in which they can experience but still have a connected framework to be able to be built for us is really important. And Dean, I'm curious because Eurythka was talking about these multi-layers or you have these four different personas, right? Four very different job titles, if you will. But ultimately when you talk about the us, I thought about the us as the end user. You know, I thought it was like, you know, that's the us. So how do you get those teams who maybe aren't thinking or haven't thought about at the end of the day, you know, the person who's ultimately or the business is ultimately using that application. How do you get them to be focused on that end goal? I think you actually have to socialize them that over and over and over again. I'm basically an engineer and I remember the day when I didn't really care what the business did. I just wanted to play with my cool toys. But you know, you get to a point where you realize, you know, I want my work to matter and I want to actually make a difference in people's lives. So you do socialize that sort of thing. I think notebooks actually are an interesting story here because I see them actually infecting a larger group of people. They're actually really nice for developers when they're exploring, when they're learning. They're great, they're getting flexible enough that you could actually build like dashboards with them for like your, you know, C-suite people. And it could even be the basis for like end user environments in some cases. Even in Jupyter notebooks now, there are capabilities where you can actually publish dashboards so that the line of business can actually leverage the output of what a data scientist created. Wasn't there before, but because of this need for collaboration, we're now starting to see the data scientist can build something. They can publish it to where the line of business can access that in a way that is consumable for them. We'll show an example tonight of our studio in shiny applications, same kind of situation where the data scientist can develop an application but be able to share the output with consumers within the organization who can access that in a way that they're most comfortable to be able to do. And that's kind of the beauty here too, you talk about ease of use. I mean, and not to demean certainly the C-suite level of comprehension of what's going on, but you can present information that's relevant to the decision makers in a very actionable way. And that hasn't been done before, Dean. Or it's slow to be done. There has been a divide between giving people access to the data they need, but also letting them maybe move a little beyond it, like what if I wanna tweak this presentation in some way, but not go so far that I delete the data or see things I'm not allowed to see. So you're getting this spectrum of control that lets you expose what you need to give people the ability to do what they need to do without having to ask for permission or ask for help. I think one of the revelations in my career was about 10 years ago I was working with some product managers in an entirely different industry and discovered that these guys had learned to write SQL queries so they could get answers they needed right away out of the data warehouse without having to bug the data analysts all the time. And that really was a revelation for me that if you expose things that people can learn and give them the tools they need to do their jobs, then they will discover what they need to discover. Dean, I have a term for that. You wanna know what it is? Sure. You wanna give users in the organization the freedom they want with the trust that you need. I mean that's effectively what it is. You want to be able to deliver to everyone in the organization freedom to be able to access that data but in a way where it can be trusted. It's okay to run reverse, right? It's okay. If it runs, you're okay here as long as you know where the goal lies. Exactly. So I have to ask the boring question which is this creates this sort of seamless capability to address the needs of a whole bunch of different roles with data and analytics. But if you wanna operationalize it with those legacy systems which translates to the ones that work, how does that process work? Well, that's where Spark actually comes into play and has been pretty pivotal, right? Because it's not just about new data. I think when a lot of our clients are trying to be able to create these reactive applications it's about taking what is there and being able to bring along what is new. And so being able to use Spark as a common data access mechanism is really one of the most powerful things that we've seen. And so that's where we see actually a lot of Spark usage. I don't know about you, Dean. That's absolutely correct. It's a fantastic tool for bringing in legacy data, merging it with new data like Twitter mining and checking how the click stream worked and all these crazy sort of cross analytics that are done. Actually, this is to your point, this is sort of the other half of the light bin business is legacy modernization effectively, helping people break down their monoliths and become more reactive in the general sense of being able to implement things that are more flexible and can grow. Just for those who are listening and wanna sort of take the output of these new applications or these insightful applications and operationalize them, like systems of record zone always have these well-defined integration points to say, flag that as a money laundering transaction. How do you tie that back into those old applications? Yeah, there are several ways. So it's pretty common to use Spark for this purpose and similar ones where you just stream the data through, do some sophisticated machine learning on it or apply models that you've already trained on it. And then you can feed that data back through various mechanisms like old school SQL queries or new school streaming technologies back to your environment. Even doing this to some degree in real time if you wanna stop something that's mid-flight that you think is suspicious. But a lot of times you'll see people do this sort of on the side while the transactions are running and then apply some corrective action later. But I do think that the machine learning capabilities and the pattern detection and a lot of the tooling that's available for the data science professional today absolutely kind of identify some of those patterns. And we're seeing a lot of models that are very specific to that. That is one of the things that when we talk to a lot of data scientists, they wanted the ability to be able to say, for example, if you find patterns that look very similar to this, you may wanna start out with a model that looks something like this. And so that's one of the things that we believe through the ability to be able to share and collaborate, for example, industry models that maybe blueprints that a particular data scientist has and wants to share with other data scientists, they can absolutely do that. So it may not necessarily need to be something about delving into those legacy systems, but about being able to create models where you can find those kind of industry types of blueprints that you're talking about and share those with others. Okay. And then I assume that's the change management role and maybe the implementation role, tying it into the legacy systems and IBM professional services industry practice might have a role to play there. Well, there is absolutely a method we believe that most clients need to embark on and we call it our data-first method. We'll talk a little, you'll probably hear a lot more about that, but given where clients are today with traditional operational systems, with data warehouses and where they need to be to be able to create these reactive, insightful applications, there is a transition that needs to happen where it's an evolution from where they are to where they need to be. And so we're introducing a data-first methodology that allows clients to be able to assess where they are in their data maturity, where they need to be, and we actually have a process to help them get started. We've actually partnered with Galvanize right here to be able to help us in some of these initial engagements with clients to be able to do that. You know, and that can be a good news, bad news, right? It could be something like, oh gosh, you know, internal audit can be very helpful and insightful to say the least. But you know, you need to disrupt, I think every industry and every profession needs to disrupt itself. Oh sure. Or they will be disrupted, right? Yeah, exactly. Well, it's good to be the bearer of good news, and you are today and tonight. Absolutely. I know a lot of big announcements and we thank you for at least giving us a sneak peek here and look forward to hearing more from you tonight. Thank you, I look forward to it. Great to continue. Thanks for being with us. Thank you. Here on theCUBE and we continue from San Francisco right after this.