 Hey, welcome to this CUBE conversation. I'm Lisa Martin. I've got two guests here with me. Please welcome Philip Niemitz, the Intermediate Head of the Department for the Laboratory of Machine Tools and Production Engineering, or WZL. Philip, welcome to the program. Thank you. And we have Russ Caldwell here as well, Senior Product Manager at Dell Technologies. Russ, great to see you. Thanks for the invite. Absolutely. We're going to be talking about how the enhanced video capabilities of Dell EMC's streaming data platform are enabling manufacturing anomaly detection and quality control through the use of sensors, cameras, and x-ray cameras. We're going to go ahead, Philip, and start with you. We're abbreviating the lab, as you guys do, as WZL. Talk to us about the lab. What types of problems are you solving? Yeah, thank you. In the laboratory for machine tools, we are looking at actually all the problems that arise in production engineering in general. So that's from the actual manufacturing of workpieces that's getting used in aerospace or automotive industries and really digging into the specifics of how those metal parts are manufactured, how they are formed, what are the mechanics of this. So this is a very traditional area where we are coming from. We are also looking at how to manage all those production systems, how to come up with decision-making processes that's moving those engineering environments forward. But in our department, we recently get, or like 10 years ago, it started that this industry 4.0 scenario is getting more and more pushed into, also into research. So more and more data is gathered. We have to deal with a lot of data coming from various sources and how to actually include this in the research, how to derive new findings from this, even maybe even physical equation from all the data that we are gathering around this and manufacturing technologies. And this is something that we and from the research perspective are looking at. And talk to me about when you were founded. You're based in Germany, but when was the lab founded? The lab was founded 100 years ago, about 100 years ago. It's like a very long history. And it's like one of the largest. It is the largest institute for production engineering in Germany, or maybe even in Europe. Got it. OK, well, 100 years, amazing innovation that I'm sure the lab has seen. Russ, let's go over to you. Talk to us about the Dell EMC streaming data platform, or SDP is what referred to it. Yeah, thanks, Lisa. So it's interesting that Phillip brings up industry 4.0, because this is a prime area where the streaming data platform comes into play. Industry 4.0 for manufacturing really encompasses a few things. It's real time data analysis. It's automation, machine learning. SDP pulls all that together. So it's a software solution from Dell EMC. And one of the ways we make it all happen is we've unified this concept of time in data. Historical data and real time data are typically analyzed very, very differently. And so for we're trying to support industry 4.0 manufacturing use cases, that's really important. Looking at historical data and real time data so you can learn from the past work you've done on the factory floor and apply that in real time analytics. And the platform is used to ingest, store, and analyze data of this real time and historical data. It leverages high availability and dynamic scaling with Kubernetes. So that makes it possible to have lots of different projects on the platform. And it really offers a lot of methods to automate this high speed and high precision activities that Phillip's talking about here. There's a lot of examples where it comes into play. It's really exciting to work with Phillip and the team there in Germany. But what's great about it is it's a general purpose platform that supports things like construction, where they're doing drones with video ingestion, tracking resources on the ground, and things like that, predictive maintenance and safety for amusement parks, and many other use cases. But with industry 4.0 and manufacturing, RWTH and Phillip's team has really kind of pushed the boundaries of what's possible to automate and analyze data for the manufacturing process. What a great background. So we understand about the lab. We understand about Dell EMC-SDP. Phillip, let's go back to you. How is the lab using this technology? Yeah, good question. We actually, maybe I'll go a little bit back to the details of the use case that we are presenting. We started maybe five, six years ago where all this industry 4.0 was put into research. We wanted to get more data out of the process. So we started to apply a lot of sensors to the machine, starting with the more traditional ones, like energy consumption and some control information that we get from the machine tool itself. But the sensor system were quite not that complex. We could deal with the amount of data fairly easy using just USB sticks and some local devices just to store it. But as it getting more sophisticated, we're getting more sensor data. We apply new sensor systems to the tool and to the very where the actual process is taking place for all the delicious information is hidden. So we're getting really close to the process, applying video data, video data streams, more sensor data. And even like they're not sampled and like in IoT scenarios, we usually have some data points per second. But we're talking here about sensors that have like maybe a million data points a second. So very high frequencies that we have to deal with. And of course then we had to come up with some system like actually how to deal with this data and use this classic big data stack that we then set up for ourselves in our research facility to deal with this amount of streaming data and then apply historical analysis like Russ just talked about on this classic Hadoop data stack where we used Kafka and Storm for ingestion and then for stream processing and Spark for this traditional historical analysis. And actually this is exactly where the streaming data platform came into play because we had a meeting with one of the key accounts at the university and we were like talking about this, we were having a chat about this problem and he's like, oh, we have something, something going on in America and USA with this streaming data platform. It was still under a code name or something. And then actually Russ and I got into contact and talking about this streaming data platform and how we could actually use it and we're getting part, we were taking part in the alpha program really working with the system with the developers and it was really an amazing experience. Were you having scale problems with the original kind of traditional big data platform that you talked about with Hadoop, Apache Kafka, Spark? Was that scale issues, performance issues? Is that why you looked to Dell EMC? Yeah, like there was several issues like what is the scaling option now? And when we were not always using all of the sensors, we were just using some of the sensors and we actually also were thinking about looking how to apply this to different manufacturing technologies, to different machines that we have in our laboratory so that we can quickly add sensors, shut down sensors, we do not have to take care about setting up new workers or stuff so that the work balance is handled. But that's not the only thing. We also had a lot of issues with administrating this Hadoop stack. It's quite error prone if you do it yourself. Like we are still in the university even though we are very big laboratory, still have limited resources. So we spend a lot of time dealing with the DevOps of the system. And actually this is something where the streaming data platform actually helped us to reduce the time that we invested into this administration processes and we were able to take more time into the analytics which is actually what we are interested in. And specifically the point that Russ talked about is a unified concept of time. We now can just apply one type of analysis on historical and streaming data and do not have the separate domains that we have to deal with. We dealt with Kafka and Storm on the one side and Spark on the other side. And now we can just put it into one model and actually reduce the time to maintain and handle and implement the code. A time reduction is critical for the overall laboratory, the workforce productivity of the folks that are using it. Russ, I'd like to go back to you to tell us about first of all how long has the Dell EMC SDP been around and what are some of the key features that WZL is leveraging that you're also seeing benefit other industries? So the product actually officially launched in early 2020. So in the first quarter of 2020 but what Philip was just talking about his organization was actually the Alpha and the Beta programs earlier than that in 2019. And that's actually where we had a cross section of very different kinds of companies in all sorts of industries all over the world in Japan and Germany and the US. And that's where we started to see this pattern of commonality of challenges and how we could solve those. So one of those things we mentioned that unified concept of time is really powerful because with one line of code you can actually jump to any point on the timeline of your data whether it's the real time data coming off of the sensors right now or something minutes, hours, years ago. And so it's really, really powerful for the developers. But we saw the common challenges that Philip was just talking about everywhere. So the SDP one of the great things about it is it's a single piece of software that will install, manage, secure, upgrade and be supported of all the components that you just heard Philip talking about. So all the pieces for the ingestion, the storage and analytics are all in there. And that makes it easier to focus on the problem there. There was other common challenges that customers were seeing as well, things like this concept of derived streams so that you can actually bring in raw streams of data, leave it in its raw form because many times for regulatory reasons, audit reasons you wanna not touch that data but you can create parallel streams of that data that are called derived streams that are versions that you've altered for some consumption or reporting purposes without affecting the others. And that's powerful when you have multiple teams analyzing different data. And then finally the thing that Philip mentioned we saw everywhere, which was a unified way to interact with sensors all the same way because there's sensors for IoT sensors, telemetry, log files, video, X-ray, infrared all sorts of things. But being able to simplify that so that the developers and the data scientists can really build models to solve a business problem was really where we started to focus on how we wanted to bring to market the value of SDP. So you launched this right, you said in early 2020, right before the pandemic and all of the chaos that has- Don't recommend that by the way. Don't recommend launching into a pandemic, but yes. I'm sure that a lot of lessons learned from silver lightings, I'm sure. But obviously big challenges there. I'm curious though, if you saw, one of the things that we've learned from the pandemic is that for so many industries the access to real-time data is no longer just a nice to have. It is a critical differentiator for those that needed to pivot multiple times to survive in the early days to thrive to continue pivoting. I'm curious what other industries you saw Russ that came to you saying, all right guys, we've got challenges here. Help us figure this out. Give me a snapshot of some of the other industries that were sort of leading edge last year. Sure, there was some surprising ones. I've mentioned them a little bit but it's interesting you give me a chance to talk about them because what was also shocking about this was not only that the same problems that I just mentioned happened in multiple industries, it was actually the prevalence of certain kinds of data. So for example, the construction example I gave you where the company was using drones to ingest streaming video as well as telemetry of all the equipment on the ground, drones are in all sorts of industries. So it turns out that's a pattern but even a lower level than just drone data is actually video data or any kind of media data. And so Phillip talked about they're using that kind of data as well in manufacturing. We are seeing video data in every industry combined with other sensor data. And that's what's really surprised us in the beta program. So working with Phillip, we actually altered our roadmap after we launched to realize that we needed to escalate even more features about video analysis and actually be able to take the process to the even closer to the edge where the data is being generated. So the other industries include construction, logistics, medicine, network traffic, all sorts of data that is a continuous unbounded stream of data falls into the category of being able to be analyzed, stored, played back like a DVR with SDP. To look like a DVR, I like that. Phillip, back over to you. Talk to us about what's next. Obviously a tremendous amount of innovation in the first 100 years of WZL. Talk to me about what some of the lab's plans are for the future. From a streaming data perspective, you've got a great foundation infrastructure there with Dell EMC. What's next? We are working together with a large industry consortium and then we get a lot of information or a lot of like, yeah. Not information, but they really want to see that all this big data stuff that's coming into industry 4.0 and Russ already talked about it. We see that the data is, and then they are really satisfied in having all the data in the data centers that they have, but they want to push it to the edge. So all the analytics is getting more and more to the edge because they see that the more data they gather, the more data has to be transferred via the network. So we have to come up with ways on, of course, deploy all the models on the edge, maybe do some analytics on the edge. I don't know, something like federated learning to see maybe you do not even need to transfer the data to the data center. You can start learning approaches on the edge and combine it with different data sources but not actually sharing the data, which is a specific point in like corporations that want to cooperate using the different data sources but have some privacy issues. So this is something that we are looking into but also working with like low code or no code environments like different framework that we use here just to enable just in our laboratory but this is also something that we see in the industry and more and more people have to interact with this data management systems. So they have to somehow get a lower access point than just some Python script that they need to write. Maybe they just need to drag and drop environment where they can modify some ingestion or some transformation to the data so that not always the people, I don't know the data engineers or the computer science experts have to deal with those kind of stuff and other people can do as well. So this is something that we are looking into this in the next future, but there are a lot of different things and that's not enough time to talk about all of them. So it sounds like an idea to democratize that data to allow more data citizens to leverage that and analyze it and extract value from it because we all know data is oil, it's gold but only if you can actually get those analyses quickly and make decisions that really affect and drive the business. Russ, last question for you. Talk to us about what you see next coming in the industry. Obviously launching this technology at a very interesting time. A lot of things have changed in the last year. You've learned a lot. You've modified the technology based on the WZL implementation but what are some of the things that you see coming next? So it's really interesting because my colleague at Dell constantly reminds me that people develop solutions with the technology that they have at the time, right? It's a really obvious statement but it's really powerful to realize what customers of ours have been doing so far has been based on batch tools and storage tools that were available at the time but weren't necessarily the best match for the problem they were trying to solve and the world is moving completely to a real time of view of their data. If you can understand that answer sooner there's higher value for higher revenue, lower costs, safety, all sorts of reasons, right? To do that, everyone's realizing you can't really count on like Phil you can't count on moving all the data somewhere else to make that decision that latency or sometimes rules around controlling what data can go where really will keep it from that. So being able to move code closer to the data is where we see things are really happening. This is actually why the streaming data platform has really focused heavily on edge implementations. We have SDP core for the core data center. We also have SDP edge that runs on single node and three node configurations for headless environments for all sorts of use cases where you need to move the code and make the decisions right when the data is generated at the sensors. The other things we see happening in the industry that are really important is everything's moving to a fully software defined solution, right? This idea of being able to have software defined stream ingestion, analytics and storage such that you can deploy the solution you want in the form factor that you have available at your location is important, right? And so fully software defined solutions is really going to be where things are at and which gives you this kind of cloud like experience but you can deploy it anywhere at the edge core or cloud, right? And that's really, really powerful. Philip picked up on the one that we see a lot of this idea of low code, no code whether it's things like node red in the IT OT world where you're being able to stitch together a sequence of functions to answer questions in real time or other more sophisticated tools that ability to, like you said, democratize what people can do with the data in real time is going to be extremely valuable as things move forward. And then the biggest thing we see that we're really focused on is we need to make it as easy as possible to ingest any kind of data the more data types that you can bring in the more problems you can solve. And so bringing on as many on ramps and connectivity into other solutions is really, really important. And so for all that, SDP's team is really focused on trying to prioritize the customers like Philip's team in the RWTH WZL labs there but finding those common patterns everywhere so that we can actually kind of make it the norm to be analyzing streaming data not just historical batch data. Right, that's outstanding that as you said the world is moving to real time analytics real time data ingestion is absolutely critical and just think of the problems that we don't even know about that we could solve. Guys, thank you for joining me today talking about what WZL is doing with the Dell EMC streaming data platform and all the innovations you've done so far and what's coming in the future. We'll have to catch up in the next six months or so and see what great progress you've made. Thank you for your time. Thanks, Lisa. Thanks to you. For my guests, I'm Lisa Martin. You're watching a CUBE conversation.