 From around the globe, it's theCUBE, presenting the convergence of file and object brought to you by Pure Storage. Okay, now we're going to get the customer perspective on object and we'll talk about the convergence of file and object, but really focusing on the object piece. This is a content program that's being made possible by Pure Storage and it's co-created with theCUBE. Christopher C.B. Bond is here. He's a lead architect for MicroFocus, the Enterprise Data Warehouse and Principal Data Engineer at MicroFocus. C.B., welcome, good to see you. Thanks, Dave, good to be here. So, tell us more about your role at MicroFocus. It's a pan-MicroFocus role. Of course, we know the company is a multinational software firm. It acquired the software assets of HP, of course, including Vertica. Tell us where you fit. Yeah, so MicroFocus is, like he says, worldwide company that sells a lot of software products all over the place, to governments and so forth. And it also grows often by acquiring other companies. So there is the problem of integrating new companies and their data. And so what's happened over the years is that they've had a number of different discrete data systems. So you've had this data spread all over the place and they've never been able to get a full, complete introspection on the entire business because of that. So my role was to come in, design a central data repository in an Enterprise Data Warehouse that all reporting could be generated against. And so that's what we're doing. And we selected Vertica as the EDW system and pure storage FlashBlade as the communal repository. Okay, so you obviously had experience with Vertica in your previous role, so it's not like you were starting from scratch, but paint a picture of what life was like before you embarked on this sort of consolidated approach to your data warehouse. What was it? Just disparate data all over the place, a lot of M&A going on. Where did the data live? Right, so again, the data was all over the place, including under people's desks in just dedicated their own private SQL servers. A lot of data in MicroFocus is run on SQL server, which has pros and cons, because that's a great transactional database, but it's not really good for analytics in my opinion. So, but a lot of stuff was running on that. They had one Vertica instance that was doing some select reporting, wasn't a very powerful system. And it was what they call Vertica Enterprise Mode, where it had dedicated nodes, which had the compute and storage in the same locus on each server, okay? So Vertica Eon Mode is a whole new world because it separates compute from storage, okay? And that first was implemented in AWS so that you could spin up different numbers of compute nodes and they all share the same communal storage, but there has been a demand for that kind of capability, but in an on-prem situation, okay? So Pure Storage was the first vendor to come along and have an S3 emulation that was actually workable. And so Vertica worked with Pure Storage to make that all happen. And that's what we're using. Yeah, I know back when we used to do face to face, we would be at Pure Accelerate. Vertica was always there. I'd stop by the booth, see what they're doing. So tight integration there. And you mentioned Eon Mode and the ability to scale storage and compute independently. And so, and I think Vertica is the only one, I know they were the first. I'm not sure anybody else does that both for cloud and on-prem. But so how are you using Eon Mode? Are you both in AWS and on-prem? Are you exclusively cloud? Maybe you could describe that a little bit. Right, so there's a number of internal rules at Microfocus that AWS has not approved for their business processes, at least not all of them. They really wanted to be on-prem and all the transactional systems are on-prem. So we wanted to have the analytics OLAP stuff close to the OLTP stuff, right? So that's why they're co-located very close to each other. And so we could, what's nice about this situation is that these S3 objects, it's an S3 object store on the Pure Flash Play. We could copy those over if we needed to AWS and we could spin up a version of Vertica there and keep going, it's like a tertiary DR strategy. Because we actually have a, we're setting up a second Flash Blade Vertica system, geo-located elsewhere for backup. And we can get into it if you want to talk about how the latest version of the Pure software for the Flash Blade allows synchronization across network boundaries of those Flash Plays, which is really nice because if there's a giant sinkhole opens up under our Colo facility and we lose that thing, then we just have to switch the DNS and we're back in business off the DR. And then if that one was to go, we could copy those objects over to AWS and be up and running there. So we're feeling pretty confident about being able to weather whatever comes along. Yeah, I'm actually very interested in that conversation. But before we go there, you mentioned you want, you got to have the OLAP close to the OLTP. Was that for latency reasons, data movement reasons, security, all of the above? Yeah, it's really all the above because we are operating under the same subnet. So to gain access to that data, you'd have to be within that VPN environment. We didn't want to going out over the public internet. So, and just for latency reasons also, we have a lot of data and we're continually doing ETL processes into Vertica from our production data, transactional databases. Right, so they got to be proximate. So I'm interested in, so you're using the pure flash blade as an object store, most people think, oh, object simple but slow, not the case for you. Was that right? Not the case at all. Why is that? It's ripping. Well, you have to understand about Vertica and the way it stores data. It stores data in what they call storage containers. And those are immutable on disk, whether it's on AWS or if you have enterprise mode, Vertica, if you do an update or delete, it actually has to go and retrieve that object container from disk and it destroys it and rebuilds it, okay? You want to avoid updates and deletes with Vertica because the way it gets its speed is by sorting and ordering and encoding the data on disk so it can read it really fast. But if you do an operation where you're deleting or updating a record in the middle of that, then you've got to rebuild that entire thing. So that actually matches up really well with S3 object storage because it's kind of the same way. It gets destroyed and rebuilt too, okay? So that matches up very well with Vertica and we were able to design the system so that it's a pand only. Now we have some reports that we're running in SQL server, okay, which we're taking seven days. So we move that to Vertica from SQL server and we rewrote the queries, which had been written in T SQL with a bunch of loops and so forth and we were to get, this is amazing, it went from seven days to two seconds to generate this report, which has tremendous value to the company because it would have to have this long cycle of seven days to get a new introspection in what they call their knowledge base. And now all of a sudden it's almost on demand, two seconds to generate it, that's great. And that's because of the way the data is stored and the S3, you asked about, oh, is it slow? Well, not in that context because what happens really with Vertica Eon mode is that it can, they have, when you set up your compute notes they have local storage also, which is called the depot, it's kind of a cache, okay? So the data will be drawn from the flash blade and cached locally. And that was, it was thought when they designed that, oh, that'll cut down on the latency, okay? But it turns out that if you have your compute nodes close, meaning minimal hops to the flash blade that you can actually tell Vertica, don't even bother caching that stuff, just read it directly on the fly from the flash blade and the performance is still really good. It depends on your situation, but I know, for example, a major telecom company that uses the same topologies we're talking about here, they did the same thing, they just dropped the cache because the flash blade was able to deliver the data fast enough. So that's, you're talking about, that's speed of light issues and just the overhead of switching infrastructure, is that, that gets eliminated? And so as a result, you can go directly to the storage array? That's correct, yeah. It's fast enough that it's almost as if it's local to the compute node, but every situation is different depending on your needs. If you've got like a few tables that are heavily used, then yeah, put them in the cache because that'll be probably a little bit faster, but if you have a lot of ad hoc queries that are going on, you may exceed the storage of the local cache and then you're better off having it just read directly from the flash blade. Got it. Okay. It's an APen-only approach. So you're not overriding on a record. So but then what? You have automatically re-index, and that's the intelligence of the system? How does that work? Well, this is where we did a little bit of magic. There's not really anything like magic, but I'll tell you what it is. Vertica does not have indexes. They don't exist. Instead, I told you earlier that it gets a speed by sorting and encoding the data on disk and ordering it, right? So when you've got an APen-only situation, the natural question is, well, if I have a unique record with let's say ID123, what happens if I append a new version of that? What happens? Well, the way Vertica operates is that there's a thing called a projection, which is actually like a materialized columnar data store. And you can have what they call a top K projection, which says only put in this projection the records that meet a certain condition. So there's a field that we like to call discriminator field, which is like, okay, usually it's the latest update, timestamp. So let's say we have record123 and it had yesterday's date, okay? And that's the latest version. Now a new version comes in. When the data at load time, Vertica looks at that, and then it looks in the projection and says, does this exist already? If it doesn't, then it adds it. If it does, then that one now goes into that projection, okay? And so what you end up having is a projection that is the latest snapshot of the data, which would be like, oh, that's the reality of what the table is today, okay? But inherent in that is that you now have a table that has all the change history of those records, which is awesome. Because you often want to go back and revisit, you know, what? What happened? But that materialized view is the most current and the system knows that, this kind of- Right, so we then create views that draw from that projection, so that our users don't have to worry about any of that. They just go and say, select from this view and they're getting the latest greatest snapshot of what the reality of the data is right now. But if they want to go back and say, well, how did this data look two days ago? That's an easy query for them to do also. So they get the best of both worlds. So could you just plug any flash array into your system and achieve the same results? Or is there anything really unique about Pure? Yeah, well, they're the only ones that have got, I think really dialed in the S3 object form because I don't think AWS actually publishes every last detail of that S3 spec, okay? So it had, there's a certain amount of reverse engineering they had to do, I think, but they got it right. When we, a couple, maybe a year and a half ago or so, they were like at 99%, but now they worked with Vertica people to make sure that that object format was true to what it should be so that it works just as if, Vertica doesn't care if it is on AWS or if it's on Pure FlashBlade because Pure did a really good job of dialing in that format. And so Vertica doesn't care. It just knows S3 doesn't care where it's going. It just works. So essentially vendor R&D abstracted that complexity so you didn't have to rewrite the application. Is that right? Right. So when Vertica ships its software, you don't get a specific version for Pure or AWS. It's all in one package. And then when you configure it, it knows, okay, well, I'm just pointed at this port on the Pure Storage FlashBlade and it just works. See, what's your data team look like? How is it evolving? A lot of customers I talk to, they complain that they struggle to get value out of the data and they don't have the expertise. What does your team look like? How is it changing? Or did the pandemic change things at all? I wonder if you could bring us up to date on that. In some ways, micro-focus has an advantage in that it's such a widely dispersed across the world company. It's headquartered in the UK, but I deal with people. I'm in the Bay Area. We have people in Mexico, Romania, India, all over the place. Yeah, all over the place. So when this started, it was actually a bigger project. It got scaled back. It was almost the point where it was gonna be cut, okay? But then we said, well, let's try to do almost a skunkworks type of thing with reduced staff. And so we're just like a hand, you can count the number of key people on one hand, but we got it all together and it's been a dramatic transformation for the company. Now there's, it's one approval and admiration from the highest echelons of this company that, hey, this is really providing value. And the company is starting to get views into their business that they didn't have before. That's awesome. I mean, I've watched micro-focus for years. To me, they've always had, part of their DNA is private equity. I mean, they're sharp investors. They do, they're a great M&A. They know how to drive value. And they're doing modern M&A. We've seen what they did with SUSE, obviously driving value out of Vertica. They've got a really, some sharp financial people there. So that's, they must have loved the skunkworks, fast ROI, small denominator, big numerator. Well, I think that in this case, smaller is better when you're doing development. It's a two minute cooks type of thing. And if you've got people who know what they're doing, I've got a lot of experience with Vertica. I've been on the advisory board for Vertica for a long time. And I was able to learn from people who had already, we're like the second or third company to do a pure FlashBlade Vertica installation. But some of the best companies out there that have already done it were members of the advisory board also. So I learned from the best and we were able to get this thing up and running quickly. And we've got a lot of other, a handful of other key people who know how to write SQL and so forth to get this up and running quickly. Yeah, so I mean, look, it pure as a fit. I mean, I sound like a fanboy, but pure is all about simplicity. So is object. So that means you don't have to worry about wrangling storage and worrying about LUNs and all that other nonsense and file. I've been burned by hardware in the past, where, oh, okay, they're building to a price. And so they cheap out on stuff like fans or other things in these, these components fail and the whole thing goes down, but this hardware is super, super good quality. And so I'm happy with the quality that we're getting. So CB, last question, what's next for you? Where do you want to take this initiative? Well, we are in the process now of, so I designed this system to combine the best of the Kimball approach to data warehousing and the Inman approach, okay? And what we do is we bring over all the data we've got and we put it into a pristine staging layer, okay? Like I said, it's a, because it's a pand only, it's essentially a log of all the transactions that are happening in this company, just they appear, okay? And then from the Kimball side of things, we're designing the data marts now so that that's what the end users actually interact with. And so we're, we're taking the, we're examining the transactional systems to say, how are these business objects created? What's the logic there? And we're recreating those logical models in Vertica. So we've done a handful of them so far and it's working out really well. So going forward, we've got a lot of work to do to create just about every object that the company needs. CB, you're an awesome guest. They're really always a pleasure talking to you. And congratulations and good luck going forward. Stay safe. Thank you, you too, Dave. All right, thank you. And thank you for watching the convergence of file and object. This is Dave Vellante for theCUBE.