 Live from Union Square in the heart of San Francisco. It's theCUBE covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. And welcome back here on theCUBE. Our coverage continues with the Spark Summit 2016 from San Francisco. Just about ready to wrap things up here, our last interviews of the day. And it's a pleasure to welcome along with George Gilbert, the lead analyst at Wikibon for Big Data Analytics. We have Sudhir Jhagar, who's the Indian head and CTO of InfoObjects. And Sudhir, thanks for being with us here. We appreciate your time. And Juan Asinyo, who is the principal engineer at Rockwell International. Juan, a pleasure to have you as well, sir. Nice to be here. Sudhir, tell us about InfoObjects, if you will, first for our viewers at home who might not be familiar with what you're doing. So InfoObjects is a company where we provide Spark consultant service around Spark ecosystem. So we help companies like Rockwell and other companies where Spark is being used heavily, or we are trying to set a Spark ecosystem for those companies. So you say to help Rockwell, so why don't I ask you then, what's a company like Rockwell doing in a place like this? I mean, how are you engaging with Spark these days and what are you seeing here? I'm just curious about your take on the show. Sure. Rockwell is in the industrial Internet of Things. We automate factories and we leverage that data from the factory. And the big data ecosystem enables us to do that scale, Spark being the integral compute engine that we're leveraging, and we reach out to our consultant partner InfoObject to help us to deliver solutions. So we say the industrial IoT. Give me a little more on that. Your definition of that because, and we know about the Internet of Things, but in terms of how this applies to industry, how the price of the, so your manufacturing clients, for example. Sure. This is a big liability with the factories. You can stop production if you make a mistake, or you can hurt people. So we're dealing with plant-floor data, factory data, and some of those are the secrets of these companies, how the rest of these have to make cheese or the cookies and all of that. So if we're bringing this data to the cloud, leveraging the data lake and Spark for compute, security, I guess, is paramount and making sure that we're not exposing customer data, at the same time bringing value to them. And so from an InfoObject's perspective, when you hear about Rockwell and their challenges, you would sit here. I mean, how do you, or in their specific case, what are you working with Rockwell on? How are you advising them in terms of all the Spark initiatives that they have at their disposal? So mostly our role is around architecture and implementation. So for example, in this particular use case, the pipeline is almost being standardized on Kafka, and then computing processing is being done on Spark, then Hadoop is for storage only. So the most of our work is around architecture and implementation of this use case. So let's drill down a level. So we were talking earlier about how your core business for a long time has been, at least in manufacturing, the intelligent controls on machine tools, or I guess other equipment as well. Only now, they're all connected. So what can you do now that you have connectivity that you couldn't do before? Yes, in my particular business unit in Rockwell, CSM, Customer Support and Maintenance, we have this remote monitoring business, IoT. So the purpose is to leverage the sensor data, and primarily the time series sensor data, as well as any other relevant data out there. So once we address that data to the data lake, we can get inside as the data goes in motion or at rest. And with the help of our consulting partner, once we have the infrastructure in place, really from that point forward, it's all about tribal knowledge. So what do we know about that particular process? And to answer your question, the data lake and the ecosystem powered by Spark, and make it possible to do this in a cost-effective way, that it would have been impossible two years, three years before. Drill into that tribal knowledge. Is it that Rockwell knows not just about the controller, but the machine that it's controlling, and therefore you know what questions to ask? Yeah, so we have, there's a good question, and in addition to providing the hardware and software to automate these factories, we also have experts that understand different verticals, like for example, automotive, pulp and paper, steel, and so on. So initially we were bringing the data and we were relying on humans to look at the data, to inspect the data visually, I mean, looking at screens and so on. And that process, you can scale that process to thousands and thousands of factories, because these experts are very rare, you can't find these experts, and so you need a layer of technology to process the data automatically. And so initially the first layer that we add in is to capture the knowledge of these individuals. So because if we have people working eight hours a day, and when they go home, you have a hole, that knowledge goes home and rest. We want to capture that and we have some spare job with that knowledge, okay? So we want to capture that knowledge so we don't depend on the people working, and then machine learning. So the same stream of data going through the system, we want to leverage models that our data scientists or with the help of info objects can create for us to address some specific verticals that we have expertise with. So you take that tribal knowledge of automotive or transport and in motion or at rest and you turn that into a model. So you've got a data scientist working with an industrial engineer and together they put together a model so that the industrial engineer can go to bed at night and you can figure out what's going on with the data. Yeah, so the business model is such as, it's an intelligent factory, we extract as much data as we can as it makes sense. So once you have the data, we have experts that they have knowledge about the particular factories and you have thousands of factories. So we want to capture their knowledge so when they go home, we're not missing their knowledge. And that's called crowdsourcing analytics, okay? Is it the knowledge of the factory or the knowledge of the industry? The knowledge of that particular factory because you may have two factories, they are paper machines, but are completely different. So there's some commonalities, but our experts, they work with these customers, sometimes they deploy the solutions, so they have intimate knowledge that they can apply, but we don't want to limit that to the, only when they're working, we want to keep that knowledge 24-7. As part job, we'll not take vacation, okay? So I can work 24-7. And on top of that, I mean the crowdsourcing analytics, we want to do the machine learning. So we have a group of data scientists augmented with our consultant partner. So these data scientists, they have tribal knowledge on what are the optimization specific to that cookie machine, okay? What is the optimization that cheese line? And that's pure knowledge. And so you can predict some KPIs. Again, once you have a high throughput stream of data, going through data lake ecosystem, Kafka, Hadoop Spark, the challenge is capturing the analytics that the humans crowdsourcing analytics and then the machine learning. So all happening at the same time. And what enabled this is the data lake and the Spark capability. So apart from tribal knowledge, I think, we are saving a lot of time here, right? Because system is becoming more proactive, right? So we heard today about the release 2.0. For your clients, what do you think is the the rank or the improvement that they can best put to use in terms of how Apache Spark is being improved? Yeah, so like say, with two point of release, like say, obviously we are excited about it. Like say, and talking about the structural streaming part of it, like say, where exactly use case like Rockwell will be helpful, right? So for example, today in the talk about streaming today, if you want to do anything like the stream of data, you need to first save it, then run your data frame APIs. Now that is possible, like say, you can directly run your data frame API on top of streams, right? So in that case, like say, your development task will become easier day by day, right? Use case overall might not be impacted, but the development scenario that will become much easier to implement. So what they're able to do then, I mean, so what is the- So use case will not, like say differ much, but what I'm saying is the development part will become easier to implement those things. I got you. Yeah. Sure, sure. And so from the client side, then are from the actual user side then, what do you see as being, you heard what Sid here was saying about structured streaming and giving you, you know, whole new capabilities. How are you going to put those in the practice? What are you looking forward here over the next 12 to 18 months in terms of being able to take those system improvements and put them in the practice? Yeah, so the challenge for us is if you're, we are targeting serving thousand and thousand of factories and bringing the data into a pipeline any improvement in the infrastructure, such as continuous streaming as opposed to micro batching or any other improvement that the, specifically sparring to the table. Again, we rely on our consultant to, based on the architecture and our requirements to advise where we can take advantage of those specific improvements. But our goal is to a very scalable data pipeline. I think we're pleased to hear those improvement that Spar brings to the table. Yeah, I think you're in pretty good hands, it sounds like it, that the capabilities are making you even more viable in your particular market. And we certainly wish you the best of luck to continue that success. And thank you both. Juan, Sid here, thank you for being with us here. Thank you. On the queue. And George and I'll be back with some final thoughts here on theCUBE here in San Francisco at Sparks Summit 2016.