 Welcome back to SuperCloud 4, where we're exploring the intersection of cloud computing and generative AI. In particular, we're interested in the disruption potential of AI, the importance of quality of data, and ways in which organizations can most effectively and efficiently bring AI to the data. And with me is Thomas Hazel, the founder and CTO of Chaos Search, a firm that brings intelligent search capabilities to cloud object storage, which eliminates the costly efforts to move data. Welcome Thomas, good to see you in studio again. Yeah, great to be here. All right, so it's been almost a year since the chat's GPT heard around the world as I sometimes say. When you first heard about that, what was your initial reaction? You've seen the rapid innovation since then. What was your reaction? Did it change the way that you thought about the opportunity and where do you land now? Well, I think we all logged on to chat GPT and play with it probably around December. But what I saw in January was the world changed. It wasn't so much the innovation, it's been there, it's been around for a while, but the global consciousness of what this could mean to business to really to everything and the focus, the energy really, we changed. And so as a founder, as a technical founder of a analytical company, I said, we got to be part of this. And so early on we adapted the technology to Chaos Search and we offer an AI assistant that you can now have a conversation with your data. And it's pretty cool, it demos really well, but it really solves the problem of I don't understand complex APIs, I don't understand high level tooling, I can just log in and have a conversation just like you do with chat GPT. So let's back up a little bit. How does Chaos Search play in this sort of ever changing field? Where do you fit? Well, so we go after log analytics at scale. We just announced Chaos Lake DB last week where we allow customers to stream their data into their data lake, GCS, S3, and we auto discover, auto index the data such that you can use well-known APIs like Elastic API or SQL via Trino. Now we don't have any elastic or press on the hood. That's the costs and complexing challenges. We re-imagined the whole data architecture and database but we allow you to do observability, security, application insights, but again, at scale ease of use and reduction of costs dramatically. So Thomas, where specifically does AI fit and how do you bring LLMs and you're not building your own LLMs? Paint that picture for us. Yeah, we took our data solution and we integrated well-known API, well-known LLMs like chat GPT, Vertex AI and now we're working with Amazon's LLMs. And what we've done is use it as a reasoning engine. So we don't put data into the LLM. We actually ask the LLM intelligent questions ask on our database. So for instance, I might say I'm a security analyst and I present the LLM with say a basic schema and I'll ask it to craft a search query and or a SQL query to do the query on our database. So it would use the hallucination and obviously directly less expensive to build your fine tune your own LLM and that combination allows our customers to interact in a new way, a conversational way. Okay, so you're sounds like you're testing or actually implementing all kinds of different LLMs. Open AI you mentioned Vertex AI, which is Google, right? The bedrock I think is just now coming into the AI, right? Yeah, so there's a variety of LLMs and to us really whoever provides the best LLM or our customers can choose which LLMs they want to use and then we integrate with that back end. So to us, we see it as a co-pilot and analyzing your data. So a lot of these large language models understand what it means to you know, what type of failures you might have over the last year. How do I query that and say elastic? How do I analyze that data in SQL and using that partnership that if you will chain a thought and agency to bring those two worlds together but not having to share the raw data with the LLM. Okay, so it's bring your own LLM. The cloud vendor presumably is responsible, correct me if I'm wrong, for making sure that the LLM vendor doesn't have access to the data. Is that correct? And we purposely as our phase one approach, we do not share any data with the LLM whether public or private. You know, a lot of times people think that you have to train up the LLM to make it have some type of intellectual answer. We just use it as a reasoning engine. So our database plus this intelligent region engine and bring these two worlds together. No sharing data. Now we do share schema to help facilitate a question but it really could be any LLM private or public. As a technology company, how difficult or simple is it for you to integrate those LLMs into your environment? Is it a selection of, is there like a menu that you can choose from? Can I use, you know, meta or not? Do you have to do any work there to integrate? Well, so as you probably heard prompt engineering is quite popular these days. And so what we've done is connect to a variety of backend LLMs and we do unique prompt engineering. And what's interesting is that prompt engineering really goes across all the LLMs that we've been using. And through that prompt engineering when a customer asks a question, we say, hey, with this information, with the customers ask, how would you craft a question on our database? And so it really hasn't been terribly hard to integrate with different LLMs. Now, different LLMs are better than others. Chat2D is pretty good. Vertex AI is pretty good like Bard is pretty good. But, you know, as they get better, our reasing engine gets better. So to us, we have a plug and play architecture that brings in the LLM with this chain of thought, with this prompt engineering we do. And again, our backend high scale database. When you say ChatGPT, are you talking specifically about ChatGPT or OpenAI's, you know, GPT-4 tools or both? I'm talking about OpenAI's LLM in the backend there. They have a API backend service that you can integrate with. And I've played with LLAMA as well. These are all toolings that as we bring those to market, it's really any LLM that the customer wants will allow them to support. I mentioned prompt engineering. Just a quick aside, I was interviewing a sophomore in high school of us, the son of, you know, one of our clients. We were at this event. And he attends Boston Latin high school. You know, very prominent school in this area. And I was asking if he uses ChatGPT and he said, well, not for school because the teachers don't allow that. But my attention is they should be teaching prompt engineering in school. Do you agree? Well, if you have interesting questions using a tool like ChatGPT, really super powers your ideas. And so you don't want it to do all the work, right? Cause it does have a ability to do some pretty clever writing, but I think we all need to understand how to use it, how to leverage it, whether you're a software developer, a blog writer, now with a multimodal inputs, you can take an image, audio and transcribe it and then ask questions about that. It's really amazing. And we know the future is this. So I think our students, colleagues should know how to use it. You'll lean in. All right, let me get back to the discussion. So Cisco just bought, as you know, Splunk for 28 billion. So there's obviously a significant interest in observability and log analytics and that entire space. How can you take advantage of these shifting market trends? Some of this consolidation, but it's clearly a signal here. Yeah, Splunk was the original log analytics platform. Their credit, it's a Cadillac in sense of its capability, but it was built many, many moons ago. And what we want to do is go after those use cases, but with a modern architecture, a lake architecture and doing a way that dramatically reduces the cost. So very often we replace Splunk or augment Splunk because great product, but at scale, it can cost you millions, tens of millions of dollars. And so we have customers that we're using, say Splunk, maybe doing a terabyte a day, but couldn't afford to go to 50. And they bring chaos in and we go to 50, black Fridays go up to 300 terabytes per day. That's something that you just wouldn't want to do in Splunk, but it's a wonderful product. The key is that we built an architecture, modern architecture, a lake architecture that allows customers to stream all those logs into one central spot and publish tooling that they know and can use. Okay, you're saying you wouldn't want to use Splunk for that, why? Because it's just too costly. I mean, if you'll Google Splunk and cost, I think. Well, it's always the big criticism of Splunk, the oracle of log analytics, but okay. And then explain what one has to do to get data into whatever observability environment. I love Splunk too, I don't mean to pick on Splunk, but how you have to prep the data, move the data, and I'm inferring that you eliminate that heavy lifting. Yeah, I mean, we know these tools are great, but as I mentioned, they're costly. They're also complex and time consuming. So imagine if you have messy data. Log data is quite messy, quite complex. Schema's changing over time. And you have to build a pipeline and ETL process to clean the data, to put it into your database of choice, stack of choice. What we took at Chaos Search was the ability to stream your data to your lake, as is, and we discover it, we auto discover the schema and automatically index the data. And if schema changes, we worry about it. And through what we call data refinery, you publish lenses or views into it that you can do dynamic schema change or transformation that you can do search on or do SQL on. And so it's really problematic. We've seen it time and time again that a lot of these relational systems don't go after that space because of that crazy, messy data. And we really went full in to automate it as well as scale, but again, don't change your tooling, just change your process. Thomas, I want to ask you as a database expert, you're seeing the rise of vector databases. There's like dozens of them. I mean, you've got Pinecone and Milvus, there's Chroma. What are your thoughts on these? Is it a feature that's ultimately going to be sort of embedded? Where's the value in your mind? So vector databases are a great tool to do similar research. It's great. However, it's a little complicated to put all your data streams into a previous scale. And what's interesting about these LMs, these multi-model LMs is maybe the future is to transcribe in high fidelity all these images, all this logging data, all these audio videos into text and index that text as search, semantic search. So, I think there's a place for vector databases and then you integrate them with the LM, but I've seen a lot of problems with hallucination even when you use it, the cost, the complexity. I think with high fidelity descriptions, maybe that's going to transcend some aspect of vector databases. We're taking that approach where we believe that the language of choice is language and the interoperability. So some say a picture's worth a thousand words. Well, what if you convert all that to language, index it and analyze it as such? That way you can now ask questions not just of the moment is summarized, but historical in the past and get some type of analysis on it that maybe the summarization is good enough let alone the hallucination problem. So is the future or maybe it's here now that you can interact with chaos search through natural language? Is that how you do it today? Yeah, so it's funny. If you're, say, a Kibana expert or a Tableau expert, looker, okay, you understand how to analyze data, but a lot of people don't know those tools let alone the search or SQL APIs and they may be a security expert, but not the tool. And so with our service, we have a tab on our console that you just log in, select the data streams you want to analyze and you can say, I'm a security analyst. How would I analyze my cloud shell data over the last five years? And it'll give you a description and you'll say, well, can you detect fail logins over that timeframe? And it will run the query for you. You didn't have to know how to do it. It'll bring up Kibana dashboard to analyze it. And so to me, it really opens up data democratization for everybody. So the idea that you can have a conversation with your data and not know these complicated tools is really powerful. This is changing the world. I mean, it's going to be virtually across every software, even personal productivity software. I sit there and do trial and error just to get my Excel chart right. You know, I want a double Y axis. How do I do that again? I got to Google it and it takes a half hour to figure out just exactly how to lay it out. So my question is, how do you see the future of data and the role of data in AI and future of data apps? Well, I think AI and data, it's going to rule the world. So more data you have, the more ability to derive insights. And so we've taken a data lake approach that with our data lake DB, we allow you to stream it all in and provide that integration to that AI to ask those questions. And you can imagine, just Google the question, how do I train an LLM? How do I change the LLM with new data? This is really complicated stuff. And we took an approach where use a powerful at scale database integrated with an intelligent reason engine and it really solves problems quickly. And again, it's something that I think we're all feeling when you just play with chat to be for the first time. Explain again how you eliminate or minimize the hallucinations. So we don't ask the LLM for the answer. We ask the LLM for the question. And so we are streaming data into our consistent database that you can do historical search, SQL analytics. And then if you have a question like how many logins failed over the last five years, we prompt engineer with that question to the LLM to craft a intelligent query on our consistent database. And that allows us to remove the hallucination of what it looks correct, but you don't know if it's right. And we see time and time again, people struggling with that hallucination problem, let alone the complexity and cost to build one. So wait, so you're kind of gaming the system to get the right question so that you get the right answer or you get a non hallucinated answer. That's right. And the LLMs are brilliant. They're highly intelligent in what they can do, but this proximity, this vector or LLM proximity is the problem. That's why it hallucinates. You know, if you do a SQL query, you do it a hundred times, it's the same answer. You do a search query a hundred times, it's the same answer. You do the same with an LLM or a vector search database with LLM. You're playing with maybe 50% sometimes. And then as data is changing, that gets more problematic. One event, you could have catastrophic forgetfulness or overfitting. These are all problems that we took an approach where let's just bring these two best of breed technology together in one solution. Are customers asking you, what are they asking you about AI? How are they thinking about it? There's a lot of experimentation going on. There's actually quite a bit of spending going on, which is kind of interesting. It's sort of stealing from some other areas. What are they asking you about their biggest challenges? Well, one is cost and two is the complexity. So I would say what we're bringing to the table is you can immediately drive value with chaos and AI with these LLMs, meaning that you don't have to spin up tons of engineering resources to build your pipeline, to train your data, and don't have to find all these GPUs to answer the question. You really only have to worry about at scale data with this intelligence AI. So what I'm finding is people are playing because they can't bring it to market. They can't productize this technology because it's fragile. What we've done, and we came out this summer with our AI assistant, it's available, it's real. It works in perfect synchronization with the LLM because it's not being asked to answer the question. Thomas, thanks so much for coming in. Appreciate your time. Appreciate you participating in SuperCloud 4. Thank you. All right, keep it right there. Dave Vellante, John Furrier and Rob Streche. We're in studio. We're bringing in conversations like this all day, live and on demand. You're watching SuperCloud 4. Be right back.