 This is the last session of the conference, so thank you for everyone participating to this point. As you can tell, I am not a Tysin Matthews. Tysin has actually unfortunately caught the flu and he missed his flight this morning. And because I was working with Tysin very closely in the deployment of generative AI, as well as I'm actually very passionate about elderly care from previous startup work that I've done, I had the honor to be able to give this talk today. So SkyPoint, to give you a little bit of context about SkyPoint, they're an AI platform that helps the senior care industry. And what they basically try to do is give access to information around elderly care and primarily geared towards nurses, doctors, et cetera, et cetera. Now the question is, why is this so important? How exciting could elderly care be? Well, the number of individuals that need elderly care is going to be doubling by 2030. The budget is $1 trillion a year in the United States alone. Just for some context, the entire US defense budget is $800 million. So what's going to happen when we have doubled the number of elderly in the United States? Is our budget going to go up by 2x to 2 trillion? No. We're going to have elderly being put out on the street. There's going to be a lot of poverty, a lot of pain. The good news is in the elderly care industry, 70% of the costs is actually on administration. It's not on care. And I know you're all shaking your heads. What do you mean by administration? Well, things like figuring out how to do my insurance payments, things like scheduling new nurses to come to take care of the elderly, all those kind of things. There's about 70% of waste that really IT should be able to get rid of. And generative AI is extremely important because when I actually try to tackle this problem when I was trying to do the startup company through mobile apps, it turns out nobody wants to do that. It turns out that nurses, elderly, children of the elderly, etc., they all communicate via text. It's all actually human interaction. So the only chance that one actually has to get rid of a lot of these costs in elderly care is around generative AI, in my opinion. So that's why this is so important from a societal standpoint. This is probably the most, I think probably there's nuclear war, climate change, and then the secondary effects due to the population inversion. So this is probably, in my opinion, the most impactful way to use generative AI. And so what Skypoint AI does is an end-to-end lifecycle for your data. So all the data, they basically have a big database of all the different data related to healthcare. It's a lakehouse of the core. And then the co-pilot, the AI co-pilot allows you to chat with your data. And one of the things that they've been working on, that Skypoint has been working on, is creating Microsoft co-pilots. So these are co-pilots that are on top of your AI-ready data. They leverage a combination of, they leverage a lot of the, they try to unify all the data. It's very industry specific. And they do actually, it's something called a lot of fine-tuning of the AI models. So that with RLHF, et cetera, et cetera. From an architecture standpoint, Skypoint leverages a combination of structured data, primarily in data bricks on top of Azure. And then they also leverage unstructured data that is stored in vector databases in AstroDB. To give a little bit more context to this, for example, in the case where there's querying structured data, they would take the data and then they would leverage a process called self-reflection. So the LLM would take the data and then start saying, okay, what do I do? I need to answer this question. So let's look at the different overtime. Let's look at the different tables there. So there's actually underneath the hood, it makes an API call via the tools API within Langchain to the Unity catalog. This is the data bricks Unity catalog to collect information about what tables there are. And it would identify the LLM would be smart enough to identify which particular columns to use from which tables. Then there's a process. Once we have collected the right schemas, the right tables that the data is from, then there's a process of generating the SQL statement. And then they would leverage previous examples called few-shot prompting in order to generate the SQL statement. So in this case, the LLM realizes it needs to join the employee attendance and group by role. Underneath the hood, there's a method called React. This is a reasoning action. It's a paper published by Google in 2020. And this is very similar to, if you guys were in my previous talk, the action perception loop. So React stands for reasoning action. And in order to generate the SQL statement, they do a couple of things. They pull in the data model when they do the prompting. They pull it in the data model. They use dynamic few-shotting. So this is the creating the natural language of SQL statements. And this is particularly useful when you're trying to do joins and group buys. They do metadata annotation. So what happens is that in the Unity catalog, where they have all the table schemas, they're able to provide extra information such as what does this table mean? What do these columns mean? How are the column information related to each other? Then they use something called structural decomposition. So this is the idea of splitting up a question into multiple questions. And then the last, but not least, is something called self-reflection. So self-reflection is a technique to see if the generated SQL statement answers the original question that was being asked. And underneath the hood, there's also a whole bunch of enterprise features. Namely, they've implemented RBAC and observability, RBAC access control, and then observability to make sure that these things are, the answers are being surfaced right. So diving a little bit more into the SQL agent, this is a Lang chain particular feature. The few-shotting. So for example, if there's a question say, what is the average rating of a community Morris house, it's able to figure out what is the right columns, leveraging the metadata to identify the right columns, and then generating this average, it's able to generate this particular statement. As I just talked about, self-reflection is able to allow you to see if the end query matches the original question that's being asked. So what about structured data? So structured data, what they do, what that does underneath the hood is that let's take a look at this question. What is the leave policy for new hires? One of the methods that they use is this called multi-query generation. So we would take one particular question and decompose it into multiple questions. So for example, what is the leave policy for new hires, then it could be broken down to what is the new policy for new hires? Actually, that's just the original question. What is the leave policy for new joiners, which is similar to new hires, but it's just an extra question. And then what is the absentee policy for new hires? So it generates three questions at one time and then calls one or more vector database tables. And then after it gets back the response from three questions, three questions in a row, the LLM is able to summarize the answers for all three. So RAG is being done on top of PDFs, text documents, etc. They've leveraged a lot of document annotation. And one of the things they do is actually they use a lot of fine-tuning. So once you have enough history built up of making these kinds of queries and they look at the thumbs up, thumbs down, they would be able to figure out what kind of data they can use to fine-tune the models, such as the open AI models. Then there's also this idea of contextual compression of algorithms. And what this is is really leveraging fine-tuning again to skip some of the steps. Because these are multiple steps, they could use fine-tuning to actually just skip some of the steps. Another way that you can use fine-tuning to skip steps is that, let's say for example in this case, there might be a step of having to go out to the database to fetch the catalog of all the schemas. One can actually take all the catalog of the schemas, fine-tune the LLM so the LLM already knows what schemas there are in the database. So you can skip the step of having to go out to the database to grab a schema. So there's a lot of nifty tricks that one can do to really cut down on a latency. All right. And then the last point is the combo prompts. So the most powerful ones are something like this. This is a fairly complicated query. It's asking you to do a lot of different things. And in order to construct this query, the answer for this query, it needs to be broken down into both structured and unstructured data. And so one of the very big concepts in retrieve augmentation is the idea of doing planning. So they're planning, they're doing a plan and they want to make the plan be able to be paralyzed so that they might be able to fetch data from multiple places all at once. And there's different access patterns that are there's different ways to actually get the data. One is you do unstructured rag with structured querying. When I say structured querying, it just means you ask a structured question to the LLM. You could do unstructured rag, sorry, this is supposed to be unstructured rag and structured rag, structured querying plus structured querying plus structured querying plus unstructured rag. So there's many different combinations of the ways that you can actually collect your data. And so for example, this particular use case, they decompose it into these several questions, sorry, these several plans in order to fetch all the data at once. And I think this is probably the ultimate in terms of being doing Q&A because you're leveraging the combination of the data that's structured as well as unstructured. And I think this is very exciting for Cassandra because majority of the data that's being stored in Cassandra these days is structured data. So the end conclusion is that Skypoint incorporates multiple pillars into enterprise AI. Those pillars are accuracy, configurability, reliability, governance, security and observability. So that's a good example of a production AI system that leverages fairly sophisticated generative AI and advanced rag techniques. Thank you. Any questions? It's been in production since June, end of June. Yeah, end of June. Yeah, they've been able to measure, the question was, are they able to measure benefits? And the answer is yes. The early customers are finding massive time savings through this. In fact, it's shocking how much time they're saving. And I think that's another point is that if you're doing saving of time to the bottom line of employees who are fairly high cost like nurses, they start at 50 bucks an hour. And if you can save one hour of nurse time just by providing these generative AI apps, you can start charging based on a certain percentage of what the nurse's salary per hour is. So I think this is kind of from a business standpoint, very different than let's say the other technologies like mobile or business reporting. Any other questions? I think the hardest part that they dealt with was hallucinations. And they basically like, so Skypoint, they threw everything at it. Every single technique possible, they've tried it out. And so there was a lot of experimentation. The area that I think they have the most opportunity, the most opportunity is this combining structured data and unstructured data. I think that's kind of the holy grail and something that they're still working on. But if somebody can crack this, that'd be very, very powerful. The other part that was really hard was to get under the five second SLA. So a lot of these techniques are multi step problems, right? So for example, taking this one query and turning into three queries, previously what they did was they took this one query and they use a technique called Flair. This is for an active retrieval generation. And what the Flair algorithm does is that it does rag, but it looks at the probability of the tokens being generated from the LLM. And if the probability is low, it would cause the system to try to re-ask another question. So meaning that you can use the LLM to actually figure out if it's unsure about itself. Now that sounds great in theory, but what happened was they found that there was too many round trip costs and that instead it's better to take one single query, turn it into multiple queries, and then do parallelization at one time. Which is actually why they moved off of a zero cognitive search onto Cassandra, is because they wanted to have a lot of this parallelization capability. So it's balancing that it's actually the hardest thing for them to do. Any other questions? Yeah, actually I don't know. But I looked at their schema and they had hundreds of tables. I looked at their Databricks schema and it was hundreds of tables. It was like a massive ER diagram that looked too small on the screen in Zoom, so it's a lot. And actually that's a point too, is that one of the things that was fairly surprising was they did add a little bit of annotations to their tables, but they managed to get high enough results. Just even without annotations, they managed to get high enough accuracy on the results. So they're, depending on the customer base, they get between 85 and 95% accurate results, like thumbs up, thumbs down. Any other questions? All right, let's then have drinks. Thank you, everyone.