 Experts advocate that a proactive approach to data resilience requires identifying threats at the first possible opportunity before they can infiltrate a system in wreak havoc. IBM scientists and engineers have recently developed new techniques for scanning data as it arrives inside a flash array measuring the entropy of the data, which as some of you may remember from your college thermodynamics class refers to the degree of randomness or disorder in a system. And in this case, IBM has applied this concept to data to help identify specific kinds of encrypted data that could be evidence of malware. IBM fellow, Andy Walls joins us here at the IBM Storage Summit to explain how this kind of technology could help organizations protect themselves from current and future malware infections. Mr. Walls, welcome to theCUBE. Well, thank you very much. It's good to be here. Hey, you're certainly very welcome. I wonder if you could tell us more about this capability and explain first why it's needed. Yeah, I work in IBM with flash systems which are block storage devices. And I remember about two years ago, people telling me, Andy, you can't do much inside a block storage device. You don't know the context. You don't know anything about the data. And I'm kind of interesting when people tell me that it doesn't, I don't lose confidence. It doesn't depress me, it actually excites me. And I started thinking, well, there are advantages to what a block storage device can do. But more directly to the point of your question is we've got a huge societal problem out there. We've got whole organized crime areas and nation states that are seeing the money they can make in ransomware and going after various kinds of companies and governments and wrecking havoc to try to make money. And we need to do whatever we can at every part of the stack. So even the block storage where the data is stored needs to do what it can to try to detect these intrusions as soon as possible. So the need is because of what's happening in society. There's every day, there's a new ransomware attack. In fact, what we're being told is as an organization, you should no longer wonder if you will be attacked. It is when you will be attacking. Interesting. So, okay, so you have a highly capable adversary and you've got this sort of black box that you described in black storage. I wonder if you could elaborate on the, what were the challenges that the IBM scientists and engineers had to go through and faced in developing this real-time data corruption and detection technology? And really, I view it as the start of a journey. It's exciting what we've done and you and I will get into what's next, where are we going? So when data enters a flash system, we, this is gonna sound amazing. Our main job is to store the data, all right? However, as that data comes in, we can run some tests on it and do something that's called Shannon entropy. It was developed way back in 1948 and your description was perfect. What Shannon entropy is meant to do is to determine the disorder in the data. How random is the data? And that's a perfect way of determining if the data is being encrypted or not. So what we've done is we've dedicated some of our processing power in the controllers of the flash system to bring the data in to test it, to sample the data and to look at what the entropy is for that data for each volume. So we look at the volumes differently. Each one is different because it represents an application from the system or part of an application. So we keep each volume separate, we determine its entropy, but we go a step beyond that. We also are looking at how the compressibility is changing. All right, so we're looking at the compressibility, we're looking at the entropy and we're combining those two and testing to see is the data changing? So a customer had a volume that had a compressibility of let's say 50%. And now what we're seeing is for some time, the entropy of the data coming in is high enough to make us raise an alert that something's going on. Now, it's not necessarily that it's a ransomware attack. It may be somebody's turned encryption on in the application and the storage administrator doesn't know. But the storage administrator needs to know because his compressibility is changing, he needs to know that he might need to allocate more storage. And so we're detecting anomalies as well as looking for ransomware attacks. So thank you for that sort of deeper dive there. Correct me if wrong, but as I recall from many, many years ago, my thermodynamics class, the entropy actually would increase over time that randomness would increase. A, is that correct? And how does that or does that at all affect how you deal with this? Yeah, so that's where entropy of data and entropy in a physics sense is a bit different. And you're looking at the second law of thermodynamics that says that everything left to its own will go to its most disordered state, right? Yep. Well, data is a little different because it's not being left to its own, if you will. So you have databases and you have applications that are highly ordered. And typically the data coming in is compressible and it is ordered. Think of just a database of all the employees in IBM. Well, that's highly ordered. And every employee has certain information about him. So the order of that data will stay pretty constant. If it starts to change, if somebody has gotten in and started to encrypt part of that data, then we're going to see more disorder, if you will. We'll see more randomness and we will detect that and we will flag it. And that's when there's a problem. So typically the entropy will stay pretty constant. The compressibility will stay pretty constant and we'll then look for changes from that norm. Thank you. So think about cyber threats and how they evolve in this constant escalation in cat and mouse game. And now with generative AI becoming so pervasive, that's going to increase as well. So as these threats evolve, how does this real-time detection system keep up? In other words, is it flexible and adaptive enough to detect future more sophisticated forms of malware, which could be from gen AI or quantum? Maybe that's going too far, but what are your thoughts on that? That's a very good question. And one that I spend quite a bit of time thinking about and that's why I started by saying, really, this is the start of a journey. And where we've started is an important beginning because in addition to looking for these alerts, what we're doing is collecting data. And has often been said these days that data is, are the crown jewels of any organization. And so collecting the data allows us to then get started by analyzing it and by looking at trends and seeing what else we can do. And we have an advantage in IBM. We have something called the flash core module. And the flash core module is essentially our own computational storage device where we have an FPGA, we have cores available down there where we're looking at putting additional things into that flash core module, where I not just sample entropy, I can look at the entropy of every IOP, every operation. I can look at how the compression changes for every operation, but we can go beyond that. We can go beyond that. We can look at how the accesses are changing. You can imagine if somebody's gotten into your system, they're able to look at your data, they're able to change your data. There's a, that's not the norm. They're outside your normal access. So they're doing reads that don't normally happen. So we're not, we're looking at how we can look not just at the entropy, not just at the compressibility, but let's look at how the data is accessed. Who is accessing it? And most importantly, what we're gonna do, and this is what's exciting, we're gonna put an inference engine inside the flash system where we feed all of this data on a real-time basis and start to see, does that match what known ransomware looks like? So we're gonna actually train machine learning models to see what ransomware looks like. And then every system, we will start to look at all of these different signals and see what do they line up. Now, as you said, our adversary, they're not standing still. They're going to be changing, but the beauty of this is we can change these models too and we can simulate where we think we're going. So that's the direction I'm headed. Looking at how we can build AI into the systems and feed all of this information in. That's exciting. We're hearing a common theme here at the IBM Storage Summit. Storage just keeps getting more intelligent, more functional, yet another example of inferencing. We've said for years now that we think the real action in AI is going to be inferencing at the edge or in applications like this, which brings me to my next question, which is if a customer says, hey, Andy, this sounds amazing. Increasingly, I'm migrating to the cloud and cloud storage. Can this technology help me in any way with the data that resides in the cloud? Is it applicable or is it just sort of confined to the IBM Flash System Array? Can you help me? Yeah, yeah. And we're going to work with the cloud provider's IBM in and of itself as a cloud provider. We're working with them to see what can be done. But one exciting thing about Flash Systems is that we're really not just a standalone system. We're really a hybrid cloud device. And what that means is that you can have our Flash Systems images in the cloud. Now, they aren't necessarily on our Flash Core modules, but they are what we call spectrum virtualized images that can be in the cloud. And so you can have some of your data on-prem, some in the cloud. You can move that data to and from. We can take cold data and move it to the cloud. And just imagine I can have these capabilities both now on your system on-prem, as well as in the cloud. And as I move the data, we can make sure that we've checked it for its entropy, for these other things I've talked about. So it actually is going to work very well with being able to have your data in the cloud or on-premise. And that's an important thing that we put together, because as you said, our clients are looking for how they can reduce the cost of storage and being able to move to the cloud in certain cases is very important. You know, there are a lot of things, Andy, that customers don't like about SaaS. But one of the things they really do love is they get more function. You know, we wake up in the morning and then there's more function there. So I wonder if you could comment on, are there adoption barriers to this or do customers just sort of get this as part of the platform? Are there any specific sort of adoption considerations that customers need to overcome? Yeah, in developing this, we had to look at that very carefully because what you don't want to do is create a barrier by having the performance impacted. All right. And so we've had to develop this in such a way that calculating that entropy does not hurt your performance. All right. And so by doing that, we do sampling. We do it very intelligently and we're still looking at every volume but we're making sure we don't hurt your performance. Obviously it does require a particular code load of the software that's like any function. You have to have that code load. But we've made it so that there's really no barrier that if you have, whether it be our SAM virtualized controller as we see or our flash systems, even down to the lowest entry models, this will run. Now you do need to have our storage insight software. All right. Our storage insights is where we take all of this data and we put it in storage insights. And storage insights is increasing in its function. And what it does is it is going to have the capability to look for performance anomalies. Now we're adding this capability for workload anomalies and intrusion detection. And it's going to go further than that. Storage insights has other new function that will be coming out that is along the lines of AIO. So you do need storage insights as well as the flash systems. Yeah. Well, it actually sounds like a no-brainer and you continue to add value there. And how unique is this capability in the marketplace? Is this unique to IBM? Have you seen anything else like it out there? I've seen other file systems compute entropy. The difficulty of doing it at the server or doing it in the file system itself is that it can impact your performance quite a bit. And so as far as I know, in block storage devices, we're unique. And block storage being like flash systems. And so we're really proud of being the first to have this. And again, it's not just the entropy. It's combined with looking at all of the different aspects and using analytics to help us know if there is an anomaly. And where we're going, I think it will be pretty challenging because it takes quite a bit of processing to do these kinds of things. And that's where having a computational storage device like the Fastcore module is really advantageous because I have a couple of CPU cores on the Fastcore module that haven't been doing anything. I hate lazy processes. I can't stand processes that are not doing anything. So now, in the future, they're going to be busy collecting this data, analyzing it and sending it up to storage insight. So I think it's unique and we're just going to expand on that differentiation. Okay, so you've seen it in other file systems but not specifically applied with the combinatorial factors that you mentioned specifically in block storage. So my question is, will this lead to other innovations, whether it's in storage or other parts of the portfolio where this might make sense to apply in the future? Yeah. Along those lines, what I think is important is that this not be viewed as its own thing. What we need to do is combine the benefit of our analytics with the security of the rest of the stack. So the beauty here is, imagine, I'm looking at the data, but there's other parts of the stack that are doing what they do best, looking for intruders, looking for those who have gotten into the network, looking for those who are starting to get into the file system. All right, so let's then combine all of this and that's where Storage Defender, that I'm sure is being talked about in other interviews, Storage Defender can bring all of this together and then look at the output of my analytics, the output of file system analytics, the output of security software, combine all of this then to give even a better view of what's happening to a person's entire data center. It's got to be pretty cool to someone like yourself, a technical mind, building stuff in the labs in R&D, seeing that hit products, but not only be a feature of a product, but also seeing it have potential applications to help other use cases and other people against whether it's security attacks or potentially other opportunities. So that's got to be gratifying. I tell you, it's probably the most gratifying thing I've done in my entire career. I remember talking to a ISV a couple of years ago, whose customers are hospitals. And I asked them if they're having ransomware attacks and they said, yes, they are. And unfortunately, they've even had patients die as a result of these ransomware attacks. Well, I tell you, that put it at a whole new level for me. This is a societal thing that we've got to do. This isn't just to help IBM. We've got to do what we can to prevent that kind of disruption. And so it is very gratifying to be a part of this whole mission to try to protect our customers. I'll bet. Andy Walsh, thanks so much for the good work you're doing and thanks for coming on theCUBE. Thank you very much. I enjoyed it. All right, keep it right there for more deep dives into new innovations from IBM Storage Summit live from theCUBE's Palo Alto Studios and on demand at theCUBE.net. We'll be right back.