 Hello and welcome to this special CUBE Conversation. I'm John Furrier, host of theCUBE here in Palo Alto Studios. We've got two CUBE alumni technologists here to talk about a great topic, empowering data intelligence. We're going to explore the intersection of knowledge graphs, vector databases, and Gen AI with Neo4j and Persistence. We've got Pandaren Kamat, who's the Chief Technology Office Research and Innovation with Persistence Systems. And we're going to sit here, I'm going to say hi to the Chief Product Office at Neo4j. Jen, I'm great to see you. Thanks for coming back on theCUBE. Appreciate seeing you. Good to be here. Good to be here, John. So two technologies on one topic that's got a lot we could probably talk for an hour. Knowledge graphs have been around for a while. Neural networks has talked a lot about in AI. Knowledge graphs, knowledge management, knowledge systems. Gen AI is kind of the perfect storm on this here. Talk about this piece because Neo4j set the standard for graph databases. Okay, this becomes a very key part of this new infrastructure that everyone's building on. It's not the static web, siloed web we used to. It's got to be adaptive. It's got to have flexibility. It's got to be responsive, but the data is different in this. Take us through the technology perspective of how the graph databases are powering the intersection of all the great stuff. No, absolutely, John. I think you're absolutely right. There has been an evolution in the industry, especially with a generative AI, large language models, the kind of applications people are thinking about. And within enterprises mainly, knowledge graphs have been there being deployed on top of Neo4j for a pretty long time, but most of that information was structured and most of the use cases were enabling set of specific use cases like fraud management or fraud detection, anti-money laundering, supply chain visibility, but the opportunity for enterprises now is as large language models are coming in and generative AI technology is coming in to break the data silos within organizations. Now, you have tons of databases, hundreds of thousands of them in large organizations trapping the data between them. And then when you want to go ahead and have an intelligent application that can answer questions that span across all this information, it's super hard to do that. And I think having this knowledge graph that can represent the knowledge out of all these systems into a single framework helps LLMs get the right information to give you the right answers to your questions and the problems. So I think we're seeing this interest in knowledge graphs to be the backbone for a lot of generative AI applications that people want to build. And that's where I think there is, you see this new interest in new sets of use cases that enterprises are looking at. Penner, you guys doing a lot of R&D, you guys groundbreaking AI work we've been covering now for over a year and a half at Persistent. The title, Empowering Data Intelligence, we need more data intelligence. Hallucination is a big topic that comes up a lot with LLMs, the size of the models, small language models are in vogue. We put out a power law with Cube Research showing that models can coexist together. But yet be separate. So what does empowering data intelligence mean to you? Sure. So first of all, it's a real pleasure to be here with Sudhir discussing this topic. We have been an avid user as a technology company of graph databases and Neo4j in particular, over the years, even building solutions around tax research, around fraud detection like you talked about, and so forth. So it's a special opportunity that in today we are not only working with their technology but with them and for them in actually moving the needle forward on how graph databases and generative AI can drive intelligence. So fundamentally, John, as you're well aware, the retrieval augmented generation and the internal reasoning of large language models has certain limitations to how far it can infer topics. And particularly it struggles when trying to bring together relationships across very disparate datasets, both structured and unstructured. It is not inherently built for that. So you have to augment it. And we typically augment it with vector databases by vectorizing information. But what graph databases and knowledge graphs bring to the table is they allow us to, like Sudhir said, pull in data from disparate data sources within enterprise, build out those relationships so that you can pull a deeper context and insight to give to the LLM to synthesize its output on. And so that's the one biggest advantage that we see with knowledge graphs. And the other is overall finding themes and reasoning across datasets and volumes of data that's inherently again harder for an LLM to do by itself. And that's where graph databases and knowledge graphs have been a big boon in driving up the quality, accuracy and contextual relevance of the answers that we are giving with these AI services. Sudhir, talk about the data aspect here because data is critical. The knowledge graph, okay, the knowledge system, however we're going to look at this, the AI infrastructure is developing. The data has to be available, okay? It can't be siloed. Talk about the importance of the shift that's happening now on how people are thinking about their data and then how that's going to integrate in as LLMs and foundation models become the standard for how applications will interact with data. I think let me go a little deeper on the large language model side a bit and then we'll come back with the enterprise data, right? Like large language models were built to basically enable you to go create, generate new information based on existing information. That's what the technology is about. It's mostly predicting what should be the next word based on historically what it has learned from. Now, it's a great technology for generating language but when we were in school, we went to the English class and we went to the math class and science class. English allowed you to be more creative. Math and science allowed you to be accurate about the information. I think the challenge with large language models is it can only be as accurate as the information it has learned from but also it's a generative technology. It's the English class. So it will generate information. Sometimes we call it hallucination but I call it like, hey, it's a generative technology is doing the job it's supposed to do. Now in enterprises, there are two sets of use cases, right? I actually have some information. Let's say for example, I had a customer case where something went wrong. I need to send them an email and I know exactly what the issue happened and I want to take that information, give it to a large language model and say, hey, can you please create a really good-looking email to customer John whose order was misplaced and we were able to find it, create an email. Of course, large language model is going to be great at doing that because it knows how to write good email because it's learned from the web what good English looks like. Take the same scenario, same infrastructure and look at second use case where I'm an enterprise, I have the whole enterprise knowledge. I have an engine, I have parts in it, I want to go ahead and figure out, hey, I've just seen an engine oil light has come up. I need to go ahead and figure out what the technician should do to go ahead and solve that problem. Now this requires you to know for that engine, what is the light mean? What are the parts that are required? What are the quantities you need to have? It needs the whole knowledge set that's only available in many cases in the enterprise. The large language models were never trained for it. So how do you take all that data? And then in this scenario, you also have the unstructured parts manuals along with the engine and its bill of material. So think about bringing all that information, converting into knowledge into a single system, not like five databases and seven systems that the data is distributed and all the parts manuals in some object store somewhere sitting around, you to convert all of that into a knowledge graph that can then be accessed and you get real facts from there. And then you still use large language model for explaining the answer to the technician, but the facts have to come from the knowledge graph. So using large language models for the English or the language benefits, but using knowledge graphs for the accuracy of information, combine both of them and you get the best of both worlds. So you now are more accurate. You can explain the answer because you know in the knowledge graph it came from this engine, these parts, these documents, you can completely be explainable. And that's how you can build the best generative AI solution. So that's how this whole world comes together between structured and unstructured vector information as part of your knowledge graph and answering that question, using large language and natural language from there, but using the facts from knowledge graph. So it's great. Just to add one point on that, this is very beautifully put actually, the last part that he mentioned, which is large language models given that they are neural network based and so forth, they are inherently a little less explainable than where the industry has been trying to go with building explainability into the inference AI provides. So knowledge graphs provide that traceability, the visibility that you can give even to the end user depending on the user experience you're creating to build further trust into this system. So it's not only about getting the accuracy, right, but also about bringing trust and traceability to a system that when it went off the gate was very opaque in that sense, right? So that's a great point. And I think that's the key thing there you brought up is that right now everyone's talking about hallucinations, don't trust the data. And what they're really talking about is not generative AI, I'm talking about the large language models because they play with open AI and they say, that's not accurate. I wasn't involved in that, but it says I am. So they see the error and go, I don't trust it. What you're getting at is you can have your knowledge graph have proprietary data that's known to the company working with the LLMs together to get the accuracy. And even further, because those knowledge graphs were driving customer experiences, other applications that were pre-gen AI. Now you bring knowledge graphs in with generative AI to create a new generative solution, which is the fusion of the data with the right mix. Is my getting that right? Is that the kind of way to think about it? You nailed it, John. I think absolutely right, right? Like large, people think about chat GPT, they try it now or barred and they're like, oh, I answered that question, it was wrong about this one. But what they are looking at is the large language model that was trained on complete web data. And you mentioned in Pandram was talking about RAG and how retrieval augmented generation is a pattern, a technology pattern that allows you to merge your large language model for English and then knowledge graph for facts, information, and make better decisions. So I think now you're seeing these innovative patterns like RAG that allow you to blend both of these technologies, large language models and knowledge graph to build generative AI applications for enterprises. And I think that's very exciting space. We are seeing tons of customers across different verticals, building different kinds of solutions based on that. We might have to do a part too because you brought up a very important point as part of the title, which is empowering data intelligence, exploring knowledge graphs and vector databases and gen AI. The vector database mode movement, having vector embeds is a key part of the RAG, retrieval augmentation generation. This is a way, a technique to use data in a new generative way to get data together. It's evolving in its most popular application right now in most of the enterprises that people are doing RAG because they have data in a knowledge graph. They want to map it up against a large language model like open AI or anthropic or cohere. So you start to see the tinkering and the experimentation, the development where RAG, the retrieval piece has to lean on the vector databases. So can you guys explain this power dynamic because this is a new thing, the vector database? I'm wondering if you want to go first then I will go after that. So really, vector databases are a way in which you bring structured information, put it in a mathematically representative form that makes it easier for your application to retrieve and bring the right context for the LLM to reason on it. Knowledge graphs provide a much richer relationship between the various key entities and how they tie to each other and allow you to kind of infer second order, third order, fourth order insights that are not intuitively captured in a vector database. And one of the most exciting trends and I don't want to speak on behalf of Sudhir but the vision that Sudhir and team have laid out is about why do you need to do this two things separately, they can be much richer together when existing together in a single stack, right? And our entire mission is about helping customers make that easier to build these vector databases to build the ontology that is relevant to their business. And then from that ontology, build a very rich knowledge graph that can actually augment your vector databases and give that much more larger and not just larger but actually more precise context for the LLM to reason on. And that's how we see this two exciting technologies come together to drive better intelligence than just a vector database by themselves, right? That's a great point. Yeah, and just to add a little bit to that, right? I think we are seeing a lot of people experiment with vectors and vector databases as well as like knowledge graphs and all the initial start starting use case that most customers or users start with this. I have a bunch of PDF documents, unstructured data. I will give it to the LLM embedding model or embedding call, convert it into vector, store it into a vector database and run a similarity search. So vector databases are or vector solutions or stores are great at similarity searches based on vector embeddings. That's what they're really good at. So one use case, one thing. Now it's powerful in some cases where you have a lot of PDF documents. You want to search through them. You can do chunking of that data, convert into vectors and all that. But when you go ahead and think about a large enterprise, unstructured data or PDF documents are only one piece of the puzzle. They also have a lot of structured data, which is like, hey, I'm bill of material for all of my product graph or I have supply chain. And in the supply chain, I have all the entities that are connected and which suppliers provide me which products and all. So that whole structured information is also there. And so today, what you can do with technologies like Neo4j and building a knowledge graph is just take, blend both worlds of what is possible with a vector database, which is then unstructured data converted into a vectorized information for similarity search and also structured notion of what you can look at from a bill of material for your product or your supply chain or your customer 360, which is your relationship with the customer product and all. And all the unstructured documents, the feedback the customer has given, that becomes a vectorized information in the knowledge graph. And you can do similarity searches based on unstructured data. You can do a search based on find me all the users or customers that look similar to this. You can do a blend of all of these use cases. So what we are seeing is in enterprises, the use cases are a lot more like nuance than just searching based on a unstructured data. They require you to understand everything around it. And this is where knowledge graphs are great way of representing that information that is structured, like in many cases hierarchical and then also vectorized information for your unstructured data. Yeah, it brings two worlds together and actually scales it up faster. Let's get into the implementation. Can you guys share the implementation of these GNAI solutions at Neo4j? How are you harnessing the data for insights? Could you take us through the examples that you guys are working together on? Yeah, maybe I'll take a stab at it first. So one example is, we are building a lot of solutions traditionally in the healthcare life sciences space in the banking financial services and insurance space. So what we saw was an opportunity to bring these two together to drive innovation there. So to give you an example with a, we built certain proactive accelerators that allow us to search and sift through tons of biological data, whether it is drug molecule data, genetic, the way the genes express themselves and databases that show relationships between various genes and how they express and unstructured data, which is research documentation from within an enterprise that talk about how drug-to-drug interaction or which molecules and are targeting which kind of disease molecules and so forth, bringing all of this together and adding a layer that is powered by both vector databases and knowledge graphs makes it easier for researchers who are pursuing new therapies to do their job by finding out relationships that may not be intuitive unless they are writing very specific queries. So giving them a natural language aperture to which they can ask questions, follow up questions on the data that they receive and so forth, just like chatting with a large language model. But here it's bringing these disparate pieces of information where you would typically have to write separate queries, separate data sources and then synthesize it manually. We are able to do that through a single aperture. Just to think of another use case in the legal business. I imagine if you're trying to figure out, show me all contracts that are expiring next week or next month and have a certain types of payment pattern. This kind of query is really challenging just by for an LLM by itself or even an RAG pattern because it can span across systems, span across data sources that are of different types and then synthesize the right answer. When you combine knowledge graphs and vector databases, it makes it easier because the knowledge graphs bring you all those relationships that are inherent in this question. And then the LLM is able to synthesize the answer much more accurately for you. And then the places where we are partnering, I think we realize the value of vector, vector searches. So we basically made sure sometime last year, I think it was August, we announced ability to have all the capabilities of vector store or vector database has directly in Neo4j as part of the knowledge graph. So this partnership basically allowed customers to go ahead and bring unstructured and structured in the single knowledge graph. The other area that we are working together on enabling our customers jointly on technology side is one big challenge for customers is building the knowledge graph themselves. How do you get started? Where do you, how do you get all the structured data within an organization siloed across everything, bringing it and making it in one single knowledge graph, but more importantly, bringing the unstructured documents and adding to it. And then simplification of that is super valuable for our customers. And so one of the things we are working with persistent is how do you make that unstructured data in addition to knowledge graph, super simple. We're working on tools that will be accessible to our customers where just you can take existing knowledge graph or a build new knowledge graph from unstructured data and we can automatically reduce the time to value significantly for customers for any kind of use case they want to enable. It's a fascinating topic. And if you think about knowledge graphs of the previous generations, it's been purpose built for use cases were specific. They're well, well formed, well identified use cases. Now with generative AI knowledge graphs offer the horizontal scalability of the data, meaning you don't have to replicate the data can be shared. So a search, a retrieval, a discovery feature is available now across the organizations for companies. So it's like, it's a huge concept. Can you guys explain that phenomenon and what are the roadblocks and considerations to implement say a knowledge graph with vectors because to me this is becoming the standard infrastructure aspect of data management that will power the applications because think about the leverage you get on data reuse, data availability, data intelligence, data context, the behavior of the data. So this knowledge graph intersection with vector brings a new kind of retrieval feature we see, but if there's more there, can you guys explain the importance of this and what are some of the roadblocks and consideration companies should take when architecting their enterprise? Yeah, I think let me get started and I can ask Pandran to jump into. I think my biggest thing is your information or knowledge, or let me put this in, the data in companies is completely siloed. So converting that into insights is step one, like can you bring it together in some form? And then converting that into knowledge allows you to go ahead and build intelligent applications and all. So you're absolutely right, John. Looking at knowledge graphs as a mechanism to be powering those intelligent application is the most important thing for enterprises as they think about generated applications. Now this is a challenge for organizations when we work with them, their data is siloed in a lot of different systems. And when they create knowledge graphs, how do you actually structure that knowledge graph? How do you add semantics and context to it is an important thing. We are building tooling to make it super simple for people to go build these knowledge graphs from structured unstructured data. But I think the big thing there is starting small, identifying the use case that you can get the biggest value for your intelligent application from and then modeling only those aspects in the knowledge graph and be additive to it. Because one of the beauties of the technology we have built over the last 10, 12 years in Neo4j is you're not stuck to a fixed schema and knowledge graph. Your knowledge cannot be limited to a schema in a table with like n columns and y things and every time a schema changes, it blows up, right? I think it is ever growing like a model for us. And so therefore you can start with set of use cases, make sure there's value and then you can keep adding more data and information to grow the knowledge graph over a set of use cases for you to enable. And then of course, like powering that with all your RAG based application powered by that, just can become the enterprise infrastructure for you over time. I think that's my point of view on this. Pandur, what do you think? Yeah, I'm a bit over to that. What I want to add is look, we at persistent had been already using Neo4j for a lot of the work we were doing even before. And when we started combining it in our GNI solutions, what we realized was we needed certain tooling to accelerate our own engineering teams in going from the unstructured data to an ontology to actual building of the knowledge graph. So we built some accelerators ourselves for that, right? And then we realized that some of the business users to see value, what if we gave them natural language apertures so they don't have to write these Cypher queries, which is the query language used by Neo4j. And we combined LLMs and our vector database knowledge to kind of build that aperture. It is very fortunate us that we are on the same mission. So last year when we intersected and figured out that, look, this is a shared mission. We don't have to build it on our own. In fact, coupling together actually makes us stronger. So that's where we started doing this together with Neo4j and for them. So that's been a great journey. And I think if we do these things right, we are not only going to enable enterprises to get to that foundation layer faster, but we can make it more easier for not just the technical users, but the business users to access the data stored in these knowledge graphs with the help of all these layers we added. And you mentioned ontologies, I love that word. I haven't heard that since college back in the eighties, but think about what knowledge graphs and gen AI and vectors do around scaling ontologies, making them more available and more adaptive. And I remember interviewing Adam Sileski at re-invent last year. He said the number one thing about gen AI is to be adaptive. And adaptivity, being adaptive means being flexible. And neural networks are graph, right? So you got neural networks, you got knowledge graphs, vector database, structured data, all that's got to run together. And so it's a great solution you guys have together. I guess my final question to be for you guys, what's next with persistent Neo4j? What's coming down the pike? What's happening? I think my focus always in few things, right? I think the world is only going to get more connected, more complex and much bigger. And so if you think about another seven, eight years down the line, I expect everything to be hyper-personalized. Everything, the scale of data is only going to grow bigger and bigger in organizations. The organizations themselves are going to be way more connected and way more complex. And if you think about digital twins powering personalization to your supply chains to everything, it is a connected world that we are living in and it will get more and more connected. So building technology that allows you to represent that complexity in form of knowledge graphs powered by Neo4j is a big focus, bringing structured and unstructured data in the same space by bringing the better capability to scale unlimited scale that customers need. That is a big focus. And then from generative AI side, I think, making sure that, like we are on mile one of 26 mile race, I love running marathons and half marathon. In mile one, like we've seen so much innovation. So on technology side, we are making sure we are integrated with all the platforms. So we have integrated with Azure Open AI, Bedrock as well as Vertex AI at Google. We are integrated with Lama Index and Lang Chain and any new frameworks that are coming in. So we want to make sure the technology is well integrated. You have ability to go ahead and solve complex problems with it. So that's what my focus is at Neo4j and then Pandran. Yeah. And that's very well put, Sudhir. So John, our mission, particularly with AI and generative AI has always has been cutting through the hype. How do we bring enterprises the ability to deploy enterprise grade, enterprise scale solutions that give predictable consistent accurate responses that can drive real value. This requires bringing in ability to ground what foundation models are doing. And there is no better tool right now than knowledge graphs to augment everything else we have been doing in bringing that grounding to life. And our mission, and we are very excited about this collaboration is to work closely in making knowledge graphs centerpiece of driving intelligence in the enterprise and making adoption of gen AI that much more smoother and faster to see value. And trust and accuracy are huge factors here. Trust, accuracy, scale, and also the intelligence and the output is at the end of the day. And also explainability. I think the only other word I would add is explainability of like when you get an answer, how did I get it? That is super valuable and critical for customers. Well gentlemen, I'm really impressed with the technology chops. You guys have obviously great work together persist in Neo4j. Great time to be a technologist right now. A lot of action, but it's the risk-reward ratios there. But if you don't get it right, you can have a same effect. So there's a huge upside potential right now in this market as knowledge graphs and neural networks, gen AI, new infrastructure is coming to the table. And again, take advantage of it. It's a win. If you miss it, you'll be on the wrong side of history. So thanks so much for sharing the perspective and empowering the data intelligence series here. Thanks for coming on. Thank you for having us. Thank you, John. Thanks for having us. I'm John Furrier with the Cube. We're here in Palo Alto. Thanks for watching Powering Data Intelligence.