 Hello everyone, I'm John Furrier with theCUBE. We're here in Palo Alto, California for a remote interview and session for theCUBE presents AWS startup showcase, the next big thing in AI security and life sciences. I'm John Furrier. We're here with a great segment on cloud, next big thing in cloud with Chaos Search, Thomas Hazel, Chief Technology and Science Officer of Chaos Search joined by Jeremy Foren, the head of data analytics, the bad boy of data analytics as they say, but BAI communications, Jeremy Thomas, great to have you on. Great to be here. Pleasure to be here. So we're going to be talking about applying large scale log analytics to building the future of the transit industry. I'll see Telco's a big part of that. Smart cities you name, the use case, self-driving trucks, cars you name and everything's now edge. The edge is super valuable. It's a new kind of last mile, if you will. It's moving fast, it's mobile. This is a huge deal. Let's get into it, Thomas. What's this big story around this session? Well, we provide unique ability to take all that edge data and drive it into a data lake offering that we provide data analytics both in logs, BI and coming out with ML this year into next. So our unique play is transforming customers cloud object storage into an analytical platform. And really I think with BIA is a log analytics specifically where there's a lot of data streams from all those devices going into a lake that we transform their lake into analytics for driving, I guess, operational analysis. You know, Jeremy, I remember back in the day, I'm old enough to remember when the edge was just a remote switch or campus hub or something. And then even on the Telco side, there was no Wi-Fi back in 2000. And someone was driving in a car, you got any signal, you're lucky. Now you got no perimeter, you have unlimited connectivity everywhere. This has opened up more of an omnichannel data problem. How do you see that world? Because you still got more devices pushing out at this edge and it's getting super local, right? Even on the body, even on people in the car. So certainly a lot of change on the infrastructure side. What does that pose for a data challenge? Yeah, well, I would say that, you know, users always want more, more bandwidth, more performance. And that requires us to create more systems that require more complexity to deliver that user experience that we're very proud of. And with that complexity means, you know, exponentially more data. And so one of the Wi-Fi networks we offer in the Toronto Subway System, T-Connect, you know, we see 100 to 200,000 unique users a day. And you can imagine just the amount of infrastructure to support that so that everyone has a seamless experience and can get their news and emails and even stream media while they're waiting for the subway. So you guys provide state-of-the-art infrastructure for cell Wi-Fi broadcast radio IP networks, basically for, I mean, I call it the smart city, kind of go-to. But that's basically anything involving kind of that edge piece. This is a huge thing. So as smart cities are on the table, which, and you're seeing 5G being called more of an enterprise app, whether it's feeding large, dense areas of people, this is now a new modern version of what I would call the smart city blueprint. What's changed in your mind on this whole modernization of the smart city infrastructure concept? What's new? What's cutting edge? Yeah, I would say that, you know, there's an explosion of data and a lot of our insights aren't coming from one system anymore. It's coming from collecting data from all of the different pieces, the different infrastructure, whether that's your fiber infrastructure or your wireless infrastructure. And then to solve problems, you need to correlate data across those systems. So we're seeing more and more technologies that allow you to do that correlation. And that's really where we're finding tons of value, right? Thomas, take us through what you guys do as a product, the value proposition, the secret sauce, and why here with Jeremy, why is this conversation important for the folks watching? What's the connection between Chaos Search and BAI communications? Well, it's data, right? And lots of it. So our unique platform allows people like Jeremy to stream all this data, right? In, you know, today's world, terabytes go to petabytes really easily, billions go to trillion really easily. And so providing the analysis of that data for their operations is challenging, particularly based on technology and architectures that have been around for a long time. So what we do here at Chaos Search is the ability for BIA to stream all these devices, all these services into one centralized data lake on their cloud-op storage, where we connect to that cloud-op storage and transform it into an analytical database to do, in this case, log analytics and do it seamlessly, easily, where a new workload, a new stream just streams into that lake and we as a service takeover, we discover, we index it and publish well-known open APIs and visualization so that they can focus on their business and not all the operational data pipeline, database and data engineering type work that, again, at these type of scales is frankly a nightmare. You know, one of the things we've always observed on the queue, when you see new things come out that are really cool groundbreaking products like you guys are doing, it's always a challenge to manage the cost and complexity of bringing in the new. So Jeremy, take us through this tech stack here because, you know, sometimes it might be unwieldy just from a tech stack perspective, never mind the business logic or the business processes that got to be either unwound or changed. Can you take us through the IT stack that's critical to support your area? Absolutely, so with all the various different equipment, you know, to provide our public Wi-Fi and our Das, carrier agnostic LT and 5G networks, you know, we need to be able to adhere to PCI compliance and ISO 27,000. So that, you know, requires us to keep a tremendous amount of our data and the challenge we were facing is how do we do that cost effectively and not have to make any sort of compromises on how we do that. A lot of times you'll find you don't know the value of your data today until tomorrow. An example would be COVID, you know, we, when we were storing data two years ago, we weren't planning for a pandemic, but now that we were able to retain that data and look back, we can see a tremendous amount of value with trying to forecast how our systems will recover when things get back to normal. And so when I met Thomas and we were sort of talking about how we were going to solve some of these data retention problems, he started explaining to me their compression and some of the performance metrics of their profession. And, you know, I said, oh, middle-out compression. And it's been a bit of a running joke between me and him and I'm sure others, but it's incredibly impressive the amount of data we're able to store at the kind of cost, right? What problem did he solve for you? Because, I mean, these guys, obviously, you know, the startups have a lot of clouds enabling more value now we're seeing this. But when you look at this, what was your core problem that you have? Yeah, so we want to be able to, I mean, primarily this is for our syslog server. And syslog servers today aren't what they were 10, 15 years ago where you just sort of had a machine and if something broke, you went and looked. Right now they're very complex. That data is feeding to various systems and third-party software. So, you know, we're actively looking for changes in patterns and we have our security teams auditing these for penetration testing and such. And then the getting that data to S3 so that we could have it in case, you know, for two, three years of storage. Well, the problem we're facing is all of that, all of these different systems we needed to feed and retain data, we couldn't do that on site. We wanted to use S3, but when we were doing some projections, it's like we don't really have the budget for all of these places meeting Thomas and working with Chaos Search, you know, using their compression brought those costs down drastically. And then as we've been working with them, the really exciting thing is they're bringing more and more features to that service or offering. So, you know, first it was just storing that data away and now we're starting to build solutions off of that sitting in storage. So that's where it gets really exciting because, you know, there it's nothing to start getting anomaly detection off those logs, which, you know, originally it was just, we need to store them in case somebody needs them two, three years from now. So Thomas, so I get this right then, what I'm hearing is obviously I've put aside the complexity and the governance side of the regulations for a minute, just generally, data retention as a key value proposition, having data available when you need it and then to do that and doing it in a very cost effective, simple way. Sounds like what you guys are offering, is that right? Yeah, I mean, one key aspect of our solution is retention, right? Those are a lot of the challenges, but at the same time we provide real-time notification like a classic log analytic type platform, a learning, monitoring. The key thing is bringing both those worlds together and solving that problem. And so this, you know, middle in, middle out, well, to be frank, we created a new technology called what we call KS Index. That is a database index that is wonderfully small as we're indicating, but also provides all the features that makes cloud-opped storage high performance. And so the idea is that use this lake offering to store all your data in a cost effective way, but our service allows you to analyze it both in a long retention perspective as well as real-time perspective and bringing those two worlds together is so key because typically you have silo solutions and whether it's real-time at scale or retention at scale, the cost, complexity, and time to build out those solutions that I know Jeremy knows also well, a lot of the folks come to us to solve those problems because, you know, when you're dealing with, you know, terabytes and up, you know, these things get complicated and to be frank, fall over quite often. Yeah, let me just ask you the question that's probably on everyone's mind who's watching and you guys probably have both heard this many times because a lot of people just throw the data lake solution around like it's, you know, they whitewashed their kind of old legacy solutions with, oh, data lake, store it in a data lake. It's been called the data swamp. So people are fearful that, okay, I love this idea of a data lake. Who doesn't like throwing data into a repository having it available at will with notifications, all this secret magic beans that just magically create value. But I doubt that. I don't want to turn into a data swamp. So Thomas and Jeremy, talk about that concern. How do you mitigate that? How do you talk to that? Because if done properly, there's huge value in having a control plane or some sort of data system that is going to be tied in with signals and just storage retention. So I see the value. How do you manage the concern that people might say, hey, I don't want a data swamp? Yeah, I'll jump into that. So, you know, let's just be frank. Hadoop was a great tool for a very narrow scenario. I think that data swamp came out because people were using the tooling in an incorrect way. I've always had the belief that data lakes are the future. You just have to have the right service, the right philosophy to leverage it. So what we do here at KS Search is we allow you to organize it, discover it, automatically index that data so that swamp doesn't get swampy. You know, when you stream data into your lake, how do you organize it such that it has a nice stream? How do you transform that data into value? So with our service, we actually start where the storage begins. Not an endpoint, not an archive. So we have tooling and services that keep your lake from being swampy to be clear. And, but the key value is the benefits of the lake, the cost effectiveness, the reliability, the security, the scale. Those are all the benefits. The problem was that no one really made cloud storage a first class citizen and we've done that. We've dressed the swamp nature but provided all the value of analysis and that cost metrics, that scale, no one can touch cloud storage. It just, you can't. But what we've done is cracked the code of how you make it analytical. Jereme, I want to get your thoughts on this too. On your side, I mean, as a practitioner and customer of these solutions, you know, the concern is, am I missing anything? And I've been a big proponent of data retention for many, many years. You know, Dave Vellante and R-Cube, those all know that I bang on the table all the time. Store your data, be a data hoarder because it's going to come back and be valuable. Costs are going down, so I'm a big fan of data retention. But the fear might be, what am I missing? Because machine learning starts to come in down the road, you got AI, the more data you have that's accessible in real time, the more machine learning is effective. Do you worry about missing anything or you just store everything? We store everything. Sometimes it's interesting where the value and insights come from your data, something that might seem trivial today down the road offers tremendous, tremendous value. So one of the things we do is provide, because we have Wi-Fi in the subway infrastructure, you know, taking that Wi-Fi data, we can start to understand the flow of people in and out of the subway network. And we can take that and provide insights to the rail operators, which get them from A to B quicker. You know, when we built the Wi-Fi, it wasn't with the intention of getting Torontonians across the city faster, but that was one of the values that we were able to get from the data. In terms of, you know, Thomas's solution, I think one of the reasons we engaged him in the first place is because I didn't believe his compression. It sounded a little too good to be true. And so when it was time to try them out, you know, all we had to do was ship data to an S3 bucket. You know, there's tons of solutions to do that and data shippers right out of the box. It took a few, you know, a few minutes. And then to start exploring the data was in Cabana, or their dashboard, which is, you know, an interface that's easy to use. So we were, you know, within a two days getting the value out of that data that we were looking for, which is, you know, phenomenal. We've been very happy. Thomas, sounds like you got a great, great testimony here. And it's not like an easy problem that he's living in there. I mean, I think, you know, I was mentioning this earlier and we're going to get into it now. There's regulations and there's certain compliance issues. At first of all, everyone has this problem now. It's not just within that space, but just the technical complexities of packets moving around. I got it on my wifi and the stop here. I'm jumping over here. I mean, there's a ton of data. It's all over the place. It's totally unstructured. So it's a tough, tough test for you guys, chaos search. So, yeah, it's almost like the Mount Everest of customer testimonials. You got it. It's a big, it's a big use case here. How does this translate to other clients and talk about this governance and security controls? Because I know there's highly regulated, you can actually, there's penalties involved on his side of the world and Telco, the providers that have these edge devices, there's actually penalties and whatnot. So not just commercials, maybe risk management, but here there's actually penalties. Absolutely. So, centralizing your data has a real benefit of not getting in trouble, right? So you have one place, you store one place. That's a good thing. But what we've done, and this was a key aspect to our offering is we as chaos search folks, we don't own the customer's data. We don't own BIAs data. They own the data. They give us access rights, very standard way with cloud app storage, roll on policies from Amazon. We don't only access rights to their data. And so not only a customer's data is a big selling point, not only for them, but for us, for compliance regulatory perspective. So, unlike a lot of solutions where you move the data into them and now they are responsible, actually BIA owns everything. They provide access so that we could provide analysis that they could turn off at any point in time. We're also socked to type one and type two compliant and you got to do it. In this world, when we were young, we ran at this because of all these compliance scenarios that we'll be in. But the long and short of it is we're transient service. The storage, cloud storage is the source of truth where all data resides. And think about it, it's architecturally smart. It's cost effective. It's secure. It's reliable, durable. But from a security perspective, having the customer own their own data is a big differentiation in the market, a big differentiation. Chairman, talk about on your end, the security control surrounding the log management environments that span across countries with different regulations. Now you got all kinds of policy dimensions and technical dimensions and topology dimensions. Yeah, absolutely. So how we approach it is we look at where we have offerings across the globe and we figure out what the sort of highest watermark level of adherence we need to hit. And then we standardize across that. And by shipping to S3, it allows us to enforce that governance really easily. And right to Tom's point, we manage the data, which is very important to us. And we don't have to be worried about a third party or if we want to change providers years down the road, although I don't think anyone's coming out with 81% compression anytime soon. But yeah, so that's for us, it's about meeting those high standards and having the technologies that enable us to do it. And Chaos Search is a very big part of that right now. Well, let me ask you a question for the folks watching that are like really interested in this topic. What would you say to them when evaluating Chaos Search? Obviously your use case is complex, but so are others. As enterprises start to have an edge, obviously the security posture shifts, everything shifts. There's no more perimeter and the data problem becomes acute to them. So the enterprises are going to start seeing what you've been living for in your world. What's your advice to people watching? My advice would be to give them a try. It has been really quite impressive. The customer service has been hands-on and we've been getting, they've been under promising and over delivering, which when you have the kind of requirements to manage solutions in these very complex environment, cloud, local, various data centers and such, that kind of customer service is very important, right? It enables us to continue to deliver those high-quality solutions. So Thomas, give us the overview of the secret sauce. You've got great testimonial here. You've got people watching. What's different now in the world that you're going after? What wave are you on? Talk to the people who are watching this and saying, okay, why Chaos Search? Why are you relevant? Obviously there's some cool things you're doing. I love that. What's cool and what's relevant and what's in it for them if they work with you? Yeah, so that whole Silicon Valley reference actually got that from my patent attorney when we were talking. But yeah, no, we focus on, if we can crack this code of making data wonderfully small, store small, move small, process small, but then make it multi-model access, make it virtual transformation. If we could do that and we could transform cloud over storage into a high-performance, analytical database, all these heavy, heavy problems, all that complexity that scaffolding that you build to do these types of scales would be solved. Now, what we had to focus on, and this has been my, I guess you say, live passion is working on a new data representation and that's who our secret sauce that enables a new architecture, a new service that where the customer focus on their tooling, their APIs, their visualizations that they know and love, what we focus is on taking that data lake and again, to transform it into an analytical database both for log analytics, think of like elastic search replacement as well as a BI replacement for your SQL warehousing database and coming out later this year into 2022, ML support on one representation, you don't have to silo your information, you don't have to re-index your data, both so elastic search, SQL and actually ML TensorFlow access on the exact set of representation. So think about the data retention, doing some post analysis on all those logs of data, months, years, and then maybe set up some triggers when you see some anomaly that's happening within your service. So you think about it, the hunt with BI reporting, with predictive analysis on one platform, again, it sounds a little unicorn, I agree with Jeremy, maybe it didn't sound true, but it's been a life's work. So it didn't happen overnight and it's eight years at least in the making, but I guess a life journey in the end. Well, you know the timing is great, all the database geeks out there who have been following the data industry know that it's a good point for structured data, but when you start getting into mechanisms and they become a bottleneck or a blocker to innovation, you're starting to see this idea of a data lake being let the data kind of form, let it be, I don't want, I hate the word control plane, but more of a connective tissue between systems is becoming an interesting thing. So now you can store everything. So you know, no worries there, no blind spots and then let the magic of machine learning in the future come around. So Jeremy, with that I got to ask you since you're the bad boy of data analytics at B.A.I. Communications, head of data analytics. What do you look for in the future as you start to set this up? Because I can almost imagine and connecting the dots here in the interview, you got the data lake, you're storing everything, which is good. Now you have to create more insights and get ahead of the curve and provide some prescriptive and automated ways to do things better. What's your vision? First, I would just like to say that, you know, when astrophysicists talk about, you know, dark, dark energy, dark matter, I'm convinced that's where Thomas is hiding the ones and zeros to get that compression. I don't know that to be fact, but I know it to be true. And then in terms of machine learning and these sort of future technologies which are becoming available, you know, starting from scratch and trying to build out, you know, models that have value, you know, that takes a fair amount of work and that landscape keeps changing, right? Being able to push our data into an S3 bucket and then, you know, retain that data and then get anomaly detection on top of it. That's, I mean, that's something special and that unlocks a lot of ability for, you know, our teams to very easily deliver anomaly detection machine learning to our customers without having to take on a lot of work to understand the latest and greatest in machine learning. So I mean, it's really empowering to our team, right? And a tool that we're going to. Yeah, and I love the name Chaos Search, Thomas. I got to say, you know, it brings up the inside baseball around Chaos Monkey, which everyone knows was a DevOps tool to create kind of day two, simulate day two operations and disruptions in DevOps. But what you're really getting at is a whole new architecture that's beyond DevOps movement. It's like next gen architecture. Talk about that to the people watching who have a lot of legacy and want to transform over to a more enabling platform that's going to give them some headroom for their data. What do you say to them? How do they get started? How should they, what's their mindset? What are some first principles you can share? Well, you know, I always start with first principles, but, you know, I like to say we're the next and next gen. The key thing with the Chaos Search offering is you can start today without even Chaos Search. Stream your data to S3. We're going to make hip and cool data lakes again. And actually it's, Google it now. Data lakes are hip and cool. So start streaming now. Start managing your data in a well-formed, centralized viewpoint with security, governance and cost-effectiveness. Then call Chaos Search up and we'll make access to it easily, simply to ultimately solve your problems. Debug whether your security issue, debug whether it's performance issues at scale, right? And so when workloads can be added instantaneously in your data lake, it's game-changing. It's mind-changing. So from the DevOps folks where, you know, you're up all night trying to say, how am I going to scale from terabyte, you know, one today to 50 terabytes? Don't. Stream it to S3. We'll take over. We'll worry about that scale pain. You worry about your job of security, performance, operations, integrity. That really highlights the cloud-scale value proposition as apps start to be using data as an input, not just as a part of a repo. So great stuff. Thomas, thanks for sharing your life's work and your technology magic. Jeremy, thanks for coming on and sharing your use cases with us and how you are making it all work. Appreciate it. Thank you. Okay, this is the cubes coverage and presenting. 80 Ways to Startup showcase the next big thing here with Chaos Search. I'm John Furrier, your host. Thanks for watching.